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NOTICE 

Medicine is an ever-changing science. As new research and clinical experience 
broaden our knowledge, changes in treatment and drug therapy are required. The 
authors and the publisher of this work have checked with sources believed to be reli¬ 
able in their efforts to provide information that is complete and generally in accord 
with the standards accepted at the time of publication. However, in view of the possi¬ 
bility of human error or changes in medical sciences, neither the authors nor the 
publisher nor any other party who has been involved in the preparation or publica¬ 
tion of this work warrants that the information contained herein is in every respect 
accurate or complete, and they disclaim all responsibility for any errors or omissions 
or for the results obtained from use of the information contained in this work. Read¬ 
ers are encouraged to confirm the information contained herein with other sources. 
For example and in particular, readers are advised to check the product information 
sheet included in the package of each drug they plan to administer to be certain that 
the information contained in this work is accurate and that changes have not been 
made in the recommended dose or in the contraindications for administration. This 
recommendation is of particular importance in connection with new or infrequently 
used drugs. 
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FOREWORD 


I remember my introduction to the medical history and clini¬ 
cal examination as the most exciting moments of my early 
career. As each item in the history and physical examination 
was explained and given meaning and significance, I believed 
that after the long preclinical years I had at last reached the 
threshold of becoming a physician. I could begin to hold more 
than a comforting conversation with a patient. I could use my 
ears, eyes, and hands to disclose the patient’s problem and so 
begin to be of actual use to a real patient. As I polished my 
skills, it did not occur to me that the divination of all those 
signs and symptoms was anything but an art: the epitome of 
the art of medicine. 

But, with time, I realized that many of the so-called pathog¬ 
nomonic symptoms and signs were so merely because some¬ 
one, often the person whose name was attached to them, had 
declared that they were. Doubt started to overtake accepted 
wisdom as it became clear to me that little worthwhile evi¬ 
dence supported the artist’s tools I thought I had mastered. 

Towards the end of the 1980s, my friend David Sackett, then 
chief of medicine and clinical epidemiology and biostatistics at 
McMaster University, showed me a new way of thinking about 
all this. He equated items in the history and the physical exami¬ 
nation with traditional diagnostic laboratory tests, each suscep¬ 
tible to evidentiary testing. So he and I began planning 2 series 
of articles on evidence-based medicine to appear in JAMA. One 
of these, the Users’ Guides to the Medical Literature, was soon 
placed into the capable hands of Gordon Guyatt, also of McMas¬ 
ter University, and articles began to appear in JAMA in 1993. By 
2002, they were printed in updated form in 2 books, an Essen¬ 
tials and a fuller Manual , 1,2 both of which have been so successful 
that second editions 3,4 have just been published. 

The other series consisted of The Rational Clinical Examina¬ 
tion articles and started appearing in 1992. With the first arti¬ 
cle, Sackett and I published an editorial. 5 We reminded our 
readers of studies that showed that primary care providers 
usually establish the correct diagnosis at the end of a brief his¬ 
tory and some subroutine of the physical examination. So on 
practical grounds alone, it made sense to improve our under¬ 
standing of the parts of the history and examination that were 
useful, or useless, in pinning down, usually at an early stage of 
the disease, one diagnosis and ruling out others. We contrasted 
symptoms and signs with laboratory tests, which were sub¬ 
jected to rigorous testing before adoption, but which might 
have far less ability to narrow the diagnostic possibilities. As an 
example, we observed the overwhelming probability of coro¬ 
nary stenosis in a 65-year-old man who has smoked all his life 
when he tells you that he gets central chest tightness regularly 
on exertion, which forces him to stop and which disappears 
when he rests. 6,7 

Perhaps most important, by encouraging research into the 
history and physical examination, we wanted to restore 


respectability to a part of medicine that seemed to have been 
eroding as academic and financial rewards went to those who 
most resembled scientists relying on expensive diagnostic tests 
and least behaved as physicians relating to patients. 

ft is no coincidence that both Sackett and I, authors of the 
editorial launching the series, have served roles in the 
Cochrane Collaboration, an initiative that has had a massive 
effect on the way we see evidence and a profound influence on 
the methods and popularity of systematic review and meta¬ 
analysis. These sciences, as well as that of decision making, had 
grown up and spread to medicine during the 1970s and 1980s. 
Without them, both the Cochrane Collaboration and The 
Rational Clinical Examination series would have been impos¬ 
sible undertakings; indeed, the entire evidence-based move¬ 
ment would have grown far more slowly. 

At the same time, because of the unfamiliarity of these tech¬ 
niques and the revolutionary approach we were taking, 
namely, a scientific examination of what most clinicians con¬ 
sidered to be an ineffable art not susceptible to dissection, we 
published a primer on the precision and accuracy of the clini¬ 
cal examination. This laid out the approach to be taken and 
took the reader through the terms, methods, and calculations 
underpinning clinical diagnosis. 8 

Although each article’s purpose could be worked out from 
its title, the full meaning of the concepts took time to sink in, 
as I discovered from comments sent in by many of the expert 
specialty peer reviewers to whom I sent the manuscripts as 
they came in to JAMA. Indeed, it was unfamiliar even to some 
prospective authors. David Sackett had a firm belief that the 
reviews would be done best by generalist physicians who had 
learned basic critical appraisal skills. As the editor, I learned 
that these generalist physicians were often speaking a different 
language from our specialist reviewers. Sackett was clearly cor¬ 
rect, and it remains commonplace for specialty reviewers to 
ask that specialists be added to the writing team because, well, 
they are specialists. What has happened in our process is that 
both authors and reviewers learn from the editorial review 
process, with specialty reviewers ensuring that authors inter¬ 
pret the data in the proper context. In return, the specialists 
often learn that much of what they took for granted has no 
basis in evidence. 

The Rational Clinical Examination book should not replace 
books on clinical diagnosis. But, somewhat as the Cochrane 
Database of Systematic Reviews provides a systematic evaluation 
of all studies on a particular intervention without becoming pre¬ 
scriptive, so articles in The Rational Clinical Examination series 
are careful systematic efforts to assess the accuracy of items from 
the patient’s medical history and the clinical examination. In 
this sense, they are a revolutionary departure from what we have 
regarded as books on physical diagnosis, which, until the first 
articles in The Rational Clinical Examination series appeared, 
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had never taken that approach. Since then, however, such books 
have already started using the evidence as summarized in arti¬ 
cles in the series. 

In his preface to the eighth edition of DeGowin’s Diagnostic 
Examination, Richard LeBlond writes: 

References to articles from the medical literature are in¬ 
cluded in the body of the text. We have chosen articles 
which provide useful clinical information including excel¬ 
lent descriptions of disease and syndromes and, in some 
cases, photographs illustrating key findings. Evidence- 
based articles on the utility of the physical exam are includ¬ 
ed, mostly from The Rational Clinical Examination series 
published over the last decade in the Journal of the Ameri¬ 
can Medical Association. They are included with the caveat 
that they evaluate the physical exam as a hypothesis-testing 
tool, not as a hypothesis generating task. .. , 9 

Our series is indeed about testing tests (symptoms, signs) to 
separate the useful from the useless and so is about testing 
hypotheses. Books on physical diagnosis are hypothesis gener¬ 
ating in that they are a compendium of instructions on how to 
elicit all symptoms and signs, typically presented in the 
absence of any certain disease consideration or context, typi¬ 
cally organized by organ system (eg, “the cardiovascular exam¬ 
ination”). In contrast, our articles are usually organized by a 
certain condition (eg, “Does this patient have systolic dysfunc¬ 
tion?”). And, although there are a few articles in which the 
authors take a more hypothesis-generating tack (eg, those on 
splenomegaly and hepatomegaly), we always frame them in a 
clinical context. 

An issue all along has been whether, and how much, to inte¬ 
grate the evidence on symptoms and signs with that provided 
by diagnostic tests. In general, we have had so much material 
to deal with, and there are so many good texts on diagnostic 
tests, that we have limited our approach as much as common 
sense would allow. Some articles do include assessments of a 
few basic laboratory and radiologic studies that are commonly 
available to the clinician and that can be interpreted only by 
the physician in the clinical context (eg, the sedimentation rate 
for temporal arthritis or vascular congestion on a chest radio¬ 
graph for systolic dysfunction). Recently, we expanded the 
series to include “rational clinical procedures,” because many 
procedures are actually part of the clinical examination and 
tightly linked to the presence of the history and physical exam¬ 
ination findings. 10 

David Simel of Duke University had been immediately 
excited by the concept and was a coauthor of the first article in 
the series, “Does This Patient Have Ascites? How to Divine 
Fluid in the Abdomen.” 11 At that time, 1992, Simel made it 
clear that he intended to devote his research career to investi¬ 
gating this crucial area of medicine, and soon after he took 
over as primary editor of the series. Since then, he has stimu¬ 
lated large numbers of authors to complete these systematic 
reviews. His personal involvement with authors has brought us 
many more articles than we could otherwise have expected and 
ensured a uniform presentation. He also made certain that 
every manuscript had been through review before submission 


to JAMA, where I put each manuscript through rigorous exter¬ 
nal peer review, just as with all original submissions to JAMA. 

Each review is a considerable undertaking, often requiring 
more than a year of unpaid and often unappreciated work, 
which explains why it has taken 15 years to produce what is 
now more than 70 articles in JAMA. As news of the series 
spread, volunteer authors suggested their own topics of inter¬ 
est. The appearance of fully fledged review articles depended 
on the skills and persistence of the authors and on the persua¬ 
sive powers and analytic assistance of David Simel. Even then, 
more than a fifth of the proposed topics failed to result in pub¬ 
lishable manuscripts, usually because the authors found insuf¬ 
ficient evidence. It is for that reason that Simel and I published 
in 1995 a plea for support for a wide research agenda and the 
formation of collaborations to ensure that the wide gaps in our 
knowledge were filled. 12 

With the publication of this book, Simel has updated the 
first 51 published articles either alone or with the original 
authors. In addition, he has updated the primer 8 —essential for 
all readers of this book. David Simel’s contributions to this 
series, and the transformation he has wrought in how we think 
about the clinical examination, have been immense, and work¬ 
ing with him has been a privilege and a delight. 

This is the first book in The Rational Clinical Examination 
series. Our plan is to keep soliciting and publishing in JAMA 
articles on fresh Rational Clinical Examination topics. We wel¬ 
come volunteers with good ideas who are prepared to under¬ 
take the work. We will accumulate these articles, keeping them 
current with updates, and publish them as new chapters online 
and in succeeding editions of The Rational Clinical Examina¬ 
tion book. The Rational Clinical Examination will be published 
online with a set of teaching/learning slides for each chapter 
and will be integrated with the Users’ Guides to the Medical Lit¬ 
erature and other online-only content and features in an exten¬ 
sive evidence-based medicine Web site called, JAMAevidence 
(http://www.JAMAevidence.com). 

David Simel and I welcome Sheri Keitz (recently of the 
Durham Veterans Affairs Medical Center and Duke Univer¬ 
sity, who has now moved to the University of Miami) as edi¬ 
tor of The Rational Clinical Examination Education Guides. 
Sheri has many talents, including a fine critical eye. She has 
prepared or supervised development of all the teaching 
slides, and she has reviewed most of the Updates to the origi¬ 
nal manuscripts. 

The series started with the encouragement of George Lund- 
berg, then editor-in-chief of JAMA and the Archives journals. 
His successor, Cathy DeAngelis, has consistently and very 
strongly supported us, helping negotiate the complex path to 
publication. Annette Flanagin has been a tireless worker in 
this, as in so many other JAMA causes. This book would not 
have been possible without her. 

We are grateful to Barry Bowlus for directing the publishing 
of this book and to Richard Newman for his advice and sup¬ 
port. We are also grateful for the expertise of Jim Shanahan, 
Robert Pancotti, Helen Parr, and others at McGraw-Hill, as 
well as Peter Compitello at NewGen, and Holly Auten and her 
colleagues at Silverchair. 


XIV 


Foreword 


Publishing, like medicine, moves forward. During the last 
few years, the illustrations in JAMA have come under the care 
of Ronna Siegel and 2 medical illustrators, Cassio Lynm and 
Alison Burke. The series articles have benefited from their 
extraordinary skills, and improvements continue with the 
introduction of video images, as well as teaching clips. We also 
thank Cara Wallace and Angela Grayson for their expert edit¬ 
ing and support. 

The response to the articles published in JAMA tells us that 
this book will be useful. We also hope that readers will be stim¬ 
ulated to conduct research on aspects of the clinical examina¬ 
tion. Perhaps readers will contact us if they believe they can 
undertake the sort of review that could constitute future arti¬ 
cles in JAMA and chapters in the next book. 

-Drummond Rennie, MD, FRCP, MACP 
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PREFACE 


I’ve never met a medical student who lacked passion for mak¬ 
ing a diagnosis. And, among all the diagnoses a student might 
make, clinching the case right at the bedside is the most trea¬ 
sured. The same holds true not only for physicians in practice 
but also for all those involved in caring for patients—physician 
assistants, nurses, and physical therapists must each constantly 
assess their patient and consider what’s wrong. The Rational 
Clinical Examination series, published in JAMA since 1992 
and collected in this book, should appeal to anyone who won¬ 
ders about the meaning of a patient’s symptoms and signs. 
Many indispensable textbooks instruct learners on “how” to 
elicit the medical history and perform the physical examina¬ 
tion, but we suspect that, once the “how” is learned, clinicians 
only infrequently return to what was one of their favored text¬ 
books during their training years. When I ask clinicians to 
recall the book they used for physical diagnosis class in medi¬ 
cal school, there is no pause before they state DeGowin and 
DeGowin, Bates, Mosby, Schwartz, or another of a select few. 
We see The Rational Clinical Examination as an essential com¬ 
panion to, and not a replacement for, these time-honored texts 
of the “complete” medical history and physical examination. 

Although standard textbooks might clearly describe several 
maneuvers for detecting ascites, for example, we identify those 
findings that work best. Although textbooks typically march 
from “head to toe” without regard to diagnoses when describ¬ 
ing the complete physical examination, we start with clinical 
diagnostic questions and provide data that identify the most 
relevant symptoms and signs. Unlike physical examination 
textbooks, we also provide data on what does not work, 
derived from a thorough review of the literature that backs up 
our recommendations. 

Please recognize that we can never replace a great textbook on 
the complete medical history and physical examination because 
we will never be complete in describing the rational clinical exam¬ 
ination. There are many diagnoses we have not yet reviewed and 
many more to come. After more than 15 years of producing sys¬ 
tematic reviews in JAMA, which included the article that launched 
the evidence-based medicine movement, 1 it was time for us to 
update and combine our work in one resource for learners and 
clinicians to enjoy. 

Accordingly, this book is evidence based. We present the 
original Rational Clinical Examination article, followed by an 
Update. For each topic, we recreated the original literature 
search and evaluated the new literature dating from 1 year 
before the publication of the original article to the time we 
prepared the Update. If anything, we tried to be even more 
restrictive in applying our quality measures for including new 
research in the Updates. The Updates follow a format similar 
to that of the original articles: they open with a clinical sce¬ 
nario, present the results of the literature search, and summa¬ 
rize new information. Sometimes we discovered that we had 


not reviewed the topic as thoroughly as we thought, so we also 
recount any improvements we made when we reanalyzed data. 
Simple tables display the new findings that we incorporate 
with the previously published data. 

Because evidence-based guidelines for most diseases did not 
exist when we launched The Rational Clinical Examination 
series, we review the recommendations of the major federal 
agencies for each of the topics and highlight how our informa¬ 
tion supports or differs from those recommendations. Finally, 
we include a Make the Diagnosis section that gives a summary 
of the prior probability of the target disorder, the population 
for whom the target disorder should be considered, a table of 
likelihood ratio data for the best clinical findings, and a list of 
the accepted reference standards. 

Some readers will want more data, so we provide a structured 
review of every article identified in our Update that met our 
inclusion criteria. These reviews are available online in an Evi¬ 
dence to Support the Update section, available at http://www. 
JAMAevidence.com. JAMAevidence is a Web site resource for 
learning, teaching, and practicing evidence-based medicine that 
includes the complete online content of The Rational Clinical 
Examination and the Users’ Guides to the Medical Literature, 
along with other features, such as downloadable projection slides 
to enhance classroom or conference teaching and learning expe¬ 
rience, an extensive evidence-based medicine glossary, functional 
calculators, question wizards, customizable worksheets, podcasts, 
and regular updates. 

We hope that long-time readers of The Rational Clinical 
Examination series will recognize the painstaking care and prep¬ 
aration taken during the review of each topic. Every Update was 
reviewed by an author of the original article or a clinician who 
had no involvement with the original publication. Although this 
alone might seem reassuring and unlike typical medical text¬ 
books, we went a step further. 

For each topic, a slide presentation, called an Education 
Guide, has been prepared, primarily by Duke University 
Department of Medicine residents, or in a few cases by young 
clinical Duke University faculty members, all supervised by 
Sheri A. Keitz, MD, PhD. The Education Guides follow a simi¬ 
lar format and have been “field-tested” among learners. The 
goal in preparing the Education Guides was to have the learn¬ 
ers create a set of materials for their instructors that match 
how they, the learners, hope the topic would be taught. Just 
like the Updates themselves, the slides have also been reviewed. 
From this, we learned that trainees are among our most critical 
readers—they expect careful, accurate, and thoughtful presen¬ 
tation and exposition. The Education Guides slides are avail¬ 
able online at http://www.JAMAevidence.com. 

For current students, The Rational Clinical Examination 
demonstrates the correct way to learn the medical history 
and physical examination, giving direction in interpreting 
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the results and answering questions that typical physical 
examination textbooks do not systematically address. For 
teachers, the Education Guides, amply supplemented with 
teacher’s notes, allow you to teach physical diagnosis with an 
evidence-based approach. For established practitioners, per¬ 
haps far removed from their introductory physical examina¬ 
tion course, we hope to challenge any cynicism that clinical 
examination is all “art.” There is a science behind the art of 
clinical examination. We hope you discover that learning this 


science not only validates your role as a clinician and 
improves your skills but also is fun. 

-David L. Simel, MD, MHS 
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CHAPTER 


A Primer on the 
Precision and 
Accuracy of the 
Clinical Examination 

David L. Sackett, MD, MSc Epid, FRCPC 


This background article will introduce and explain the terms 
and concepts that are being used in the series of overviews on 
the rational clinical examination that begins in this issue of 
The JOURNAL. It includes definitions and explanations of cer¬ 
tain key concepts, clinical examples, guides for reading clinical 
journals about a diagnostic test, and a blank “working table” 
that you can use to apply the concepts on your own. 

Background articles in this series will discuss selected 
issues in the precision and accuracy of the clinical examina¬ 
tion in greater detail or extend them to more complex diag¬ 
nostic situations. Some of these issues are also discussed in 
clinical epidemiology textbooks. 1 

Of course, the precision and accuracy of the clinical 
examination are not the only concerns in the clinical 
encounter, and their proper application provides only the 
starting point for decisions about how certain we need to 
be about a diagnosis before we act on it (the decision 
threshold) and how we ought to incorporate the concerns 
of both patients and society in deciding whether and how 
to act. Later background articles will discuss these addi¬ 
tional considerations; this one will be confined to precision 
and accuracy. 

Like others in the series, this background article will be 
introduced with a patient. 

THE PATIENT 

One of your patients, whom you have not seen for several 
years, is admitted to the orthopedic service after a packing 
crate has tipped over onto his leg, producing an unstable 
fracture of his distal tibia and fibula. You stop by to see him 
as he is being prepared for surgery. He is alert and hemody- 
namically stable but smells of alcohol (at 10 AM) and has 3 
spider nevi on his upper chest (but no gynecomastia or 
asterixis). He is obese, and his belly is prominent. Among the 
questions that are raised in your mind, the following are of 
special significance: 

1. Is this man an alcoholic? You would place the odds for this 
disorder at 50-50 (and the science of the art of how clini¬ 
cians generate these odds will be the subject of a later back¬ 
ground article). The answer to this diagnostic question is 
important in the long run and in protecting him from the 
complications of acute withdrawal during and after his 
operation. 

2. Does he have ascites? You are much less sure here, but if he 
is alcohol dependent you would place the odds that the 
prominence of his belly represents ascites also at 50-50. 
Again, it would be important to know whether he has this 
manifestation of advanced alcoholic liver damage. 

Your options for answering these questions are several. To 
explore his possible alcohol abuse or dependency, (1) you 
could take the time required for a thorough confrontation and 
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or Dependency 


Yes 

No 
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Answers 
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Figure 1-1 The CAGE Questions for Alcohol Abuse or Dependency 

Characteristics: sensitivity, a/(a + c) = 60/117 = 0.51, or 51 %; specificity, 
d/(b+ d) = 400/401 = 0.998, or 99.8%. Predictions: positive predictive 
value or posttest probability of having the target disorder (alcohol abuse or 
dependency) for patients with 3 or 4 positive responses, a/(a + b) = 60/61 
= 0.98, or 98%; negative predictive value or posttest probability of not hav¬ 
ing the target disorder for patients with 2 or fewer positive responses, d/(c + 
d) = 400/457 = 0.88 or 88%; posttest probability of having the target disor¬ 
der for patients with 2 or fewer positive responses, c/(c +d) = 57/457 = 

0.12, or 12%. Prevalence or pretest probability of having the target disorder 
(adaptedfrom Bush et al 5 ), (a + c)/(a + b+ c+ d) = 117/518 = 23%. 
Abbreviation: CAGE, cut down, annoyed, guilty, eye opener. 


interrogation about the amount of alcohol he consumes (and, 
in the process, risk alienating him, estranging the nursing staff, 
and exasperating yourself); (2) you could order 1 or more liver 
function tests; (3) you could even request one of the new, “hot” 
tests for platelet enzyme activity, reported to be elevated in 
persons with alcoholism 2 ; or (4) you could ask him the 4 quick 
“CAGE” questions: Have you ever felt you should cut down on 
your drinking? Have people annoyed you by criticizing your 
drinking? Have you ever felt bad or guilty about your drinking? 
Have you ever had a drink first thing in the morning to steady 
your nerves or to get rid of a hangover (eye-opener)? This 
opening example in the series is all the more appropriate when 
we observe that the first report on the CAGE questionnaire in 
a general medical journal was by John Ewing 3 and that it was 
accompanied by an editorial from a major supporter of this 
series, George Lundberg. 4 To explore his possible ascites, 

(1) you could check him for shifting dullness, fluid wave, or 
even the puddle sign; (2) you could order an abdominal ultra¬ 
sonographic examination; or (3) you could simply ask him 
whether he has ever had swollen ankles. 

Stop for a moment and consider the implications, in terms 
of your time and somebody’s money, of the alternative ways 
of answering these 2 questions. Would it not be better if you 
could answer them both with just 5 quick questions (4 for 
CAGE and 1 about ankle swelling)? 

As it happens, you might be able to do just that. If he 
answers yes to 3 or 4 of the CAGE questions, he is an alcohol- 
abusing or alcohol-dependent man (and this medical history 
is far more powerful than any laboratory tests you can 


order). If he answers no to ankle swelling, you have pretty 
well ruled out clinically important ascites (you could double 
check the latter by testing for shifting dullness; like most such 
patients, he did not have a fluid wave, and as you will learn in 
a forthcoming overview on ascites, the puddle sign is not 
useful in him or anybody else). Thus, for both questions, a 
quick bedside examination has provided definitive diagnostic 
information, without the need for laboratory testing or diag¬ 
nostic imaging. 

How can we make such a bold statement about the power 
of these simple elements of the clinical history and physical 
examination? The answer lies in the science of the art of clin¬ 
ical diagnosis that underpins this series of overviews on the 
rational clinical examination. This first background article 
will introduce and illustrate the key elements of this science 
(and readers who want a more detailed discussion of what 
follows can consult a step-by-step discussion published 
elsewhere * 1 ). The background articles also are intended to 
convey the fun and gratification physicians derive from mak¬ 
ing correct diagnoses with crispness and dispatch. 

TAKING AN ALTERNATIVE HISTORY FOR ALCOHOLISM 

Examine Figure 1-1. In it are shown the number of positive 
answers to the CAGE questions from 2 groups of patients 
admitted to the orthopedic or medical services of a commu¬ 
nity-based teaching hospital in Boston, Massachusetts. 5 In 
the left-hand column are the responses from patients whose 
extensive evaluations (including, where indicated, detailed 
social histories, follow-ups, and liver biopsies) provided 
acceptable “proof” that they were alcohol abusers or alcohol 
dependent. In the right-hand column are patients whose 
evaluations showed that they were not alcohol abusers or 
dependent. These extensive confirmatory investigations 
often are referred to as criterion standards of diagnosis and 
typically consist of definitive findings at angiography, opera¬ 
tion, autopsy, and the like. 

This study is useful to clinicians because the CAGE history 
and the extensive (reference or criterion standard) investiga¬ 
tions were carried out independently among a wide spec¬ 
trum of well-described patients in whom it was clinically 
reasonable to inquire about alcohol abuse. It thus satisfies the 
first criterion of a valid, clinically useful article on diagnostic 
strategies that appears in Table 1-1 (has there been an inde¬ 
pendent, “blind” comparison with a criterion standard of 
diagnosis?). The readers’ guides in Table 1-1 have been used 
by the authors of this series on the rational clinical examina¬ 
tion to “screen” articles for inclusion in their overviews of 
diagnostic approaches to specific clinical problems. Table 1-1 
can be clipped and carried for easy reference when reading 
clinical articles that make claims about the usefulness of 
(especially new) diagnostic tests, and the reasoning behind 
its elements are described in detail elsewhere. 1 

The study that generated Figure 1-1 also satisfied the sec¬ 
ond, commonsense guide, for it was carried out in a patient 
sample that included an appropriate spectrum of mild and 
severe, treated and untreated alcoholism, plus individuals 
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with different but commonly confused disorders. The setting 
for the study (a large, urban, general hospital) was described, 
satisfying the third readers’ guide and permitting us to deter¬ 
mine the applicability of the results to our own setting, and 
the term normal (the fifth guide) was clearly and sensibly 
defined as the absence of alcohol abuse or dependency (we 
shall return to the fourth guide of reproducibility later). 

The authors of the CAGE study were not proposing that 
their questions be used as part of an extensive series (“clus¬ 
ter”) of diagnostic tests (so the sixth guide does not apply), 
and the questions were presented with their exact wording in 
the article, satisfying the seventh guide and permitting their 
exact application in the reader’s own practice. The final read¬ 
ers’ guide (has the utility of the test been determined?) is sat¬ 
isfied to the extent that the CAGE questions recognized far 
more persons with alcoholism, especially alcohol abusers, 
than routine clinical diagnosis and made them candidates for 
treatment and counseling. 

In summary, the CAGE study observed the methodologic 
standards required for a valid and clinically useful descrip¬ 
tion of the clinical applicability of any diagnostic informa¬ 
tion, whether it comes from the clinical history, the physical 
examination, or the diagnostic laboratory. 

THE PRECISION OF THE CLINICAL EXAMINATION 

For an item of the clinical history or physical examination 
to be accurate, it first must be precise. That is, we need to 
have some confidence that 2 clinicians examining the same, 
unchanged patient would agree with each other on the 
presence or absence of the symptom (such as our patient’s 
answer to one of the CAGE questions) or sign (such as the 
presence of spider nevi on our patient’s chest). The preci¬ 
sion (often appearing under the name of “observer varia¬ 
tion” in the clinical literature) of such clinical findings can 
be quantitated. 6 

Suppose 2 clinicians recorded whether they found spider 
nevi when they independently examined the same 100 patients 
suspected of having liver disease and generated the data shown 
in Figure 1-2. The 2 clinicians agreed that 23 of the patients 
(cell a) had spider nevi and that 66 patients (cell d) did not; 
thus, they agreed on (23 + 66)/100 = 89% of the patients they 
examined. However, 6 patients (cell c) judged to have spider 
nevi by the first clinician were judged not to have nevi by the 
second, and 5 patients (cell b) judged to have spider nevi by 
the second clinician were judged not to have nevi by the first. 
How should we interpret this precision? Is this degree of clini¬ 
cal agreement good, or should we expect better? 

We might begin by recognizing that some clinical agree¬ 
ment would occur by chance alone. For example, if the sec¬ 
ond clinician merely tossed a coin for each patient instead of 
carrying out an examination, reporting nevi if the coin came 
up “heads” and no nevi if it came up “tails,” agreement would 
be 50%. We should begin, then, by determining how much of 
the observed agreement of 89% was because of chance, so 
that we can find out how much real clinical skill (agreement 
beyond chance) was being displayed by these clinicians. 


Table 1 -1 Readers’ Guides for an Article About a Diagnostic Test 

1. Has there been an independent, “blind” comparison with a criterion 
standard of diagnosis? 

2. Has the diagnostic test been evaluated in a patient sample that included 
an appropriate spectrum of mild and severe, treated and untreated dis¬ 
ease, plus individuals with different but commonly confused disorders? 

3. Was the setting for this evaluation, as well as the filter through which 
study patients passed, adequately described? 

4. Have the reproducibility of the test result (precision) and its interpretation 
(observer variation) been determined? 

5. Has the term normal been defined sensibly as it applies to this test? 

6. If the test is advocated as part of a cluster or sequence of tests, had its 
individual contribution to the overall validity of the cluster and sequence 
been determined? 

7. Have the tactics for carrying out the test been described in sufficient 
detail to permit their exact replication? 

8. Has the utility of the test been determined? 
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Figure 1 -2 The Precision of the Clinical Examination for Spider Nevi 

Observed agreement: 

(a + d)/(a + b + c + d) = (23 + 66)/100 = 89% 

Expected agreement: 

For cell a, ([a + b] x [a + c])/(a +b+c+d) = ( 28 x 29)/100 = 8 
For cell d ,{[c+d]x[b+ d])/{a + b + c + d) = (72 x 71 )/100 = 51 
Calculate expected agreement as (expected a + expected d)/(a + b+c+d) 
= (8 + 51)/100 = 59%. 

Agreement beyond chance = k = (observed agreement - expected agree¬ 
ment)/ (100% - expected agreement) = (89% - 59%)/(100% - 59%) = 0.73. 
Conventional levels of k: slight, 0.0-0.2; fair, 0.2-0.4; moderate, 0.4-0.6; 
substantial, 0.6-0.8; almost perfect, 0.8-1.0. 

Adapted from Lundberg. 4 


Chance agreement can be calculated by the formal process of 
“marginal cross-products” shown in Figure 1-2, but it also 
can be thought of as a coin toss in which, for example, the 
first clinician’s coin came up heads 29% of the time (based 
on [a + c]l[a + b + c + d]). Thus, 29% of the 28 patients 
judged to have spider nevi by the second clinician (a + b ) 
would also be judged to have them by the first clinician, and 
29% of 28 is 8 (the number of patients we would expect to 
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find in cell a by chance alone). Similarly, the first clinician’s 
coin came up tails 71% of the time {[b + d]/[a + b + c + d]), 
and 71% of the 72 patients judged to be free of spider nevi by 
the second clinician (c + d) is 51 (the expected value for cell 
d). As a result, we would expect the 2 clinicians to agree (8 + 
51)/100, or 59% of the time, on the basis of chance alone, 
and the remaining potential agreement beyond chance is 
therefore 100% - 59%, or 41%. 

How much of this 41% potential agreement beyond chance 
was achieved? This is determined by comparing it with the 
actual agreement beyond chance of 89% - 59%, or 30%, and 
30%/41% comes to 0.73, which means that about three-fourths 
of the potential agreement beyond chance was achieved by our 2 
clinicians. This measure of agreement goes by the name K and is 
rather like a correlation coefficient. 1 It ranges from -1.0 (where 2 
clinicians would be in perfect disagreement), through 0.0 
(where only chance agreement was accomplished), to +1.0 
(where 2 clinicians would be in perfect agreement). As you can 
see in the listing of “conventional levels of k” that appears in the 
legend for Figure 1-2, the agreement between our 2 clinicians is 
considered “substantial,” and this is the case for many “present/ 
absent” aspects of the physical examination. As you might imag¬ 
ine, agreement is greater still when the 2 examinations are car¬ 
ried out by the same clinician. 

Other items on the clinical examination do not fare as well. 
For example, in one study of the chest examination, the K for 
cyanosis, tachypnea, and whispered pectoriloquy was 0.36, 
0.25, and 0.11, respectively. 7 

No measure of clinical agreement is ideal, and K is no 
exception. Its size is slightly affected by the frequency of the 
abnormal finding in the group of patients being examined (it 
is highest when half of the patients have the finding and tails 
off a bit when the finding is extremely common or uncom¬ 
mon) . If your and our interests warrant, we shall come back 
to this in a subsequent background article. 

But, of course, high precision is not enough, for examiners 
may be consistent but wrong in their assessments. All 5 mem¬ 
bers of my clinical team occasionally fail to detect a big liver or 
hear an important diastolic murmur. In other cases, clinicians 
may be neither precise nor accurate. For example, a group of 
iridologists was asked to examine the irises of a series of 
patients and distinguish those with gallstones from those who 
had sonographically empty gallbladders. 8 Their clinical agree¬ 
ment was only “slight,” with an average K of 0.18 (about like 
whispered pectoriloquy). More important, however, their 
diagnostic accuracy was no better than chance: they missed 
about half the patients with gallstones (sensitivity, 54%) and 
diagnosed gallstones in about half the patients with negative 
sonogram results (specificity, 52%). To understand sensitivity 
and specificity, we must now shift from determining the preci¬ 
sion of the clinical examination to defining the characteristics 
of its accuracy. 

THE CHARACTERISTICS OF THE 
ACCURACY OF DIAGNOSTIC TESTS 

Returning our attention to Figure 1-1, we can examine the 
accuracy characteristics of the CAGE questions. The 60 


patients in cell a of Figure 1-1 answered yes to 3 or 4 of the 
CAGE questions and constitute 51%, or 0.51, of all the 117 
patients (a + c) with a positive diagnosis of alcohol depen¬ 
dency or abuse. The shorthand term for this proportion of 
0.51, or al(a + c), is sensitivity, and it is a useful measure of 
how well a diagnostic test (whether a symptom, sign, or labo¬ 
ratory test) detects a target disorder when it is present. The 
closer the sensitivity to 100%, the more “sensitive” the clini¬ 
cal or laboratory finding. 

In the right-hand column are the responses from patients 
for whom the criterion standard ruled out the diagnosis of 
problem drinking. The 400 patients in cell d answered yes to 
2, only 1, or none of the CAGE questions and constitute 
99.8%, or 0.998, of all the 401 patients (b + d) who did not 
have alcohol dependency or abuse. The shorthand term for 
this proportion of 0.998, or d/(b + d), is specificity, and it is a 
useful measure of how often a symptom, sign, or other diag¬ 
nostic test is absent when the target disorder is not present. 
The closer the specificity to 100%, the more “specific” the 
clinical or laboratory finding. (Of course, clinicians are not 
interested in sensitivity and specificity as such but in their 
effect on the interpretation of positive and negative findings, 
and we shall get to that shortly. Sensitivity and specificity are 
properties that must be established beforehand, and that is 
why they are presented here.) 

You will observe that the sensitivity of the CAGE questions 
is not impressive. The number of “true positives” in cell a is 
almost equaled by the number of “false negatives” in cell c, 
and the sensitivity of only 51% confirms that it “misses” 
about half the problem drinkers. On the other hand, the 
specificity of the CAGE questions is outstanding. The num¬ 
ber of “true negatives” in cell d vastly outnumbers the num¬ 
ber of “false positives” in cell b, and the specificity of 99.8% 
confirms that it almost never labels a patient as a problem 
drinker when this disorder is absent. 

Now we can consider the “predictions” we make about our 
patient according to the foregoing characteristics. Because of 
the high specificity, virtually every patient in cell a who 
answered yes to 3 or 4 of the CAGE questions (a + b) has the 
target disorder, alcohol abuse or dependency, and the short¬ 
hand term for this proportion a/(a + b ), which is 60/61, or 
98%, is the positive predictive value or posttest probability of 
having the target disorder (among patients with 3 or more 
positive answers). Moreover, despite the rather unimpressive 
sensitivity, most of the patients in cells c and d who answered 
yes to none, just 1, or 2 of the CAGE questions were in cell d 
and did not have the target disorder. The shorthand term for 
this proportion d/(c + d), which is 400/457, or 88%, is the 
negative predictive value or posttest probability of not having 
the target disorder among those patients with 2 or fewer pos¬ 
itive answers. The complement of this negative predictive 
value, or c/{c + d), describes the posttest probability of hav¬ 
ing the disorder among those patients with 2 or fewer posi¬ 
tive answers, and this other way of saying the same thing is 
found useful by some clinicians. 

The reason that the negative predictive value looks rela¬ 
tively high, despite the low sensitivity, lies in the fact that the 
proportion of all patients in this study who had alcohol 
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dependency or abuse, (a + c)/(a + b + c + d), or 117/518, was 
only 23% to begin with. That is, 100% - 23%, or 77%, of the 
patients were not alcohol dependent before they were asked 
any questions. The shorthand term for the previous knowl¬ 
edge contained in this (a + c)/(a + b + c + d) is prevalence or, 
more usefully, the pretest probability of the target disorder 
(because this pretest probability is the starting point for mak¬ 
ing clinical use of the test characteristics, we will place it 
above the “predictions” entries in subsequent figures). 

In contrast to this pretest probability of 23% in the clinical 
article describing the CAGE questions, in our patient, we 
judged that the pretest probability of alcohol abuse or depen¬ 
dency was 50%. How would the CAGE questions perform in 
patients like ours? If the patients in the study summarized in 
Figure 1-1 were like our own patient, we would expect the 
result shown in Figure 1-3. 

As long as the patient “mix” and severity of disease in the 
CAGE study summarized in Figure 1-1 are similar to the 
patient mix and severity of disease in our practice, we would 
expect sensitivity and specificity to remain constant, despite 
changes from the study’s to our patient’s pretest probability of 
the target disorder. Thus, the sensitivity (51%) and specificity 
(99.8%) in Figure 1-3 are the same as those in Figure 1-1. 

Notice, however, that the negative predictive value has 
decreased from 88% to 67% because predictive values must 
change with changes in the prevalence of the target disorder. 
One useful way to think about this is to carry through this 
concept of prevalence. After all, the predictive value of a pos¬ 
itive test result is simply the prevalence of the target disorder 
among those patients with positive test results. Similarly, the 
negative predictive value is the prevalence of not having the 
target disorder among patients with a negative test result. No 
wonder, then, that predictive values must change with a 
change in the overall prevalence of the target disorder. 

BACK TO THE PATIENT 

Your patient readily admitted that he had cut down on his 
drinking, that his spouse and workmates had annoyed him 
by complaining about his drinking, and that he often needed 
an “eye opener” to get going in the morning. According to 
this quick medical history, and given your previous judgment 
(before you had any knowledge of his responses to any of 
these questions) that his chances of being alcohol dependent 
were 50-50 (ie, a pretest probability of 50%), you can follow 
his response through Figure 1-3 and conclude that his post¬ 
test probability of alcohol dependency is 99.6%, or about as 
certain as you ever can be about any diagnosis. 

Your patient helps us make another general point: because 
he gave a positive response to a diagnostic history whose speci¬ 
ficity was extremely high (99.8%), you “ruled in” the target 
disorder. A simple way of remembering this property of a pow¬ 
erful diagnostic test is the acronym SpPin: when specificity is 
extrem ely hig h, a positive test result rul es in the target disorder. 

Would the laboratory tests you were considering ordering 
have saved you some time and done a better job of determining 
this diagnosis? In fact, and in addition to delaying the diagnosis, 
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Figure 1-3 The CAGE Questions for Alcohol Abuse or Dependency 
When the Pretest Probability Is 50% 

Characteristics: sensitivity, a/(a + c) = 510/1000 = 0.51, or 51 %; specific¬ 
ity, d/(b + d) = 998/1000 = 0.998, or 99.8%. Prevalence or pretest proba¬ 
bility of having the target disorder, (a + c)/(a + b + c + d) = 1000/2000 = 
50%. Predictions: positive predictive value or posttest probability of having 
the target disorder for patients with 3 or 4 positive responses, a/(a + b) = 

510/512 = 0.996, or 99.6%; negative predictive value or posttest probability 
of not having the target disorder for patients with 2 or fewer positive 
responses, d/(c+ d) = 998/1488 = 0.67, or 67%; posttest probability of 
having the target disorder for patients with 2 or fewer positive responses, 
c/(c + d) = 490/1488 = 0.33, or 33%. Abbreviation: CAGE, cut down, 
annoyed, guilty, eye opener. 


their accuracy is much worse. In the same investigation that 
studied the CAGE questions, the specificities for y-glutamyl 
transpeptidase, mean corpuscular volume, and an entire liver 
function battery were only 76%, 64%, and 81%, respectively. 3 
Moreover, the hot new test of platelet enzyme activity has a spec¬ 
ificity of only 73%. 2 Thus, in your patient, a simple medical his¬ 
tory was not only quicker and easier but also far more specific. 

What about his possible ascites? Given that you have estab¬ 
lished the diagnosis of alcohol dependency, you already can plan 
his perioperative and postoperative management to prevent, 
detect, and treat alcohol withdrawal syndromes. Nonetheless, 
you would like to know whether he has sufficient liver damage to 
affect his handling of the sorts of drugs he is likely to receive. 
Given his fractured ankle, the kneeling position required for elic¬ 
iting the puddle sign is out of the question, and even a test for 
shifting dullness will cause him considerable pain. He has 
already been to radiology, and you do not want him to make the 
trip again for an abdominal ultrasonographic examination if you 
can avoid it. His uninvolved ankle is not swollen now, and he 
tells you he has never had ankle swelling in the past. Would this 
simple medical history for previous ankle swelling be of any use? 

Figure 1-4 summarizes a study of 63 patients admitted to a 
general medical service in Durham, North Carolina. 9 Of 15 
patients with ascites on abdominal ultrasonographic examina¬ 
tion (the criterion standard), 14 had a history of ankle swell¬ 
ing, for an impressive sensitivity of 93%. If we applied this 
sensitivity (93%) and specificity (66%) to our pretest probabil¬ 
ity for ascites of 50%, the result (shown in Figure 1-5) suggests 



























CHAPTER 1 The Rational Clinical Examination 



Presence of Ascites on 
Abdominal Ultrasonography 


Present 

Absent 

History of 

Ankle 

Swelling 

Yes 

14 



16 

30 


a 

b 


a+b 


No 


c 

d 


c+d 


1 



32 

33 


a+c 

b+d 

a+b+c+d 

15 

48 

63 


Figure 1-4 Relationship Between a History of Ankle 
Swelling and Ascites 

Characteristics: sensitivity, a/(a + c) = 14/15 = 0.93, or 93%; specificity, 
d/(b + d) = 32/48 = 0.67, or 67%. Prevalence or pretest probability of hav¬ 
ing the target disorder, (a + c)/(a + b + c + d) = 15/63 = 24%. Predictions: 
positive predictive value or posttest probability of having the target disorder 
for patients with a history of ankle swelling, a/(a + b) = 14/30 = 0.47, or 
47%; negative predictive value or posttest probability of not having the target 
disorder for patients with a negative history for ankle swelling, d/(c + d) = 
32/33 = 0.97, or 97%; posttest probability of having the target disorder for 
patients with a negative history for ankle swelling (adapted from Simel et al 9 ), 
c/(c+ d) = 1/33 = 0.03, or 3%. 
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Figure 1-5 Relationship Between a History of Ankle Swelling and 
Ascites When the Pretest Probability Is 50% 

Characteristics: sensitivity, a/(a + c) = 93/100, or 93%; specificity, d/(b + d) 
= 66/100 = 0.66, or 66%. Prevalence or pretest probability of having the 
target disorder, (a + c)/(a + b + c + d) = 100/200 = 0.5, or 50%. Predic¬ 
tions: positive predictive value or posttest probability of having the target dis¬ 
order for patients with a history of ankle swelling, a/(a + b) = 93/127 = 
0.73, or 73%; negative predictive value or posttest probability of not having 
the target disorder for patients with a negative history for ankle swelling, 
d/(c+ d) = 66/73 = 0.90, or 90%; posttest probability of having the target 
disorder for patients with a negative history for ankle swelling, c/(c + d) = 
7/73 = 0.10, or 10%. 


that the posttest probability of not having ascites is 90% when 
the patient denies ankle swelling. Again, this simple element of 
the clinical history provides powerful diagn ostic information: 
when th e sensi tivity o f a symp tom or sig n is high, a negative 

resp onse rules ou t the target disorder, and theacronym for this 

property is SnNout. 

However, you may have observed that this study included 
only 15 patients with ascites, and you may well inquire how con¬ 
fident we should feel about this sensitivity of 0.93. As it happens, 
the degree of confidence we ought to place in this (or any other) 
estimate of sensitivity (or specificity) can be calculated and 
expressed as a confidence interval, within which you can be con¬ 
fident that the true sensitivity resides, say, 95% of the time. 1 In 
this case, the 95% confidence interval on this sensitivity of 0.93 
based on 15 patients runs all the way from 0.81 (not terribly sen¬ 
sitive) to 1.00 (or perfect sensitivity). If, on the other hand, this 
sensitivity of 0.93 were based on 100 patients with ascites, the 
95% confidence interval would run from 0.88 to 0.98, and you 
would be justified in being more confident that a negative medi¬ 
cal history rules out ascites. Thus, you should look for informa¬ 
tion on the 95% confidence interval for measures of accuracy 
such as sensitivity and specificity when you read about them. 

A FASTER AND MORE POWERFUL APPROACH: 

THE LIKELIHOOD RATIO 

Many of the overviews in this series will describe not only the 
sensitivity and specificity of specific symptoms and signs but 
also their likelihood ratios (LRs). This method of describing 
the accuracy of diagnostic information, once mastered, is 
much faster and more powerful than the sensitivity and speci¬ 
ficity approach. 1 It is shown in Figure 1-6 for ankle swelling 
and ascites. In brief, an LR expresses the odds that a given find¬ 
ing on the medical history or physical examination would 
occur in a patient with, as opposed to a patient without, the 
target disorder. When a finding’s LR is above 1.0, the probabil¬ 
ity of disease increases (because the finding is more likely 
among patients with than without the disorder); when the LR 
is below 1.0, the probability of disease decreases (because the 
finding is less likely among patients with than without the dis¬ 
order); finally, when the LR is close to 1.0, the probability of 
disease is unchanged (because the finding is equally likely in 
patients with and without the disorder). 

LRs are related to sensitivity and specificity but possess some 
advantages for clinicians. In a 2 x 2 table such as Figure 1-6, the 
LR for a positive history of ankle swelling is equal to sensitivity/ 
(1 - specificity) or 0.93/0.33, or 2.8, indicating that a positive 
history is almost 3 times as likely to be obtained from a patient 
with, as opposed to a patient without, ascites. The LR for a nega¬ 
tive history of ankle swelling is equal to (1 - sensitivity)/specific¬ 
ity or 0.07/0.67, or 0.10, indicating that a negative history is only 
as likely to be obtained from a patient with, as opposed to a 
patient without, ascites (and confirming our earlier conclusion 
that this negative history permitted us to SnNout this diagnosis). 

The first advantage of LRs is that the LR for a given finding, 
when applied to the pretest odds of the target disorder, generates 
the posttest odds for that disorder. Because the LR is expressed 
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as an odds, this may at first appear cumbersome, for it means 
that the pretest probability must also be expressed as an odds 
(although this is tedious to do by hand, later, we will show you 
how to avoid the calculations by using the nomogram shown in 
Figure 1-7). When done by hand, the pretest probability of the 
target disorder is converted into pretest odds by the formula: 

Pretest odds = Probability of having the target disorder/ 
Probability of not having the target disorder 

In Figure 1-6, the pretest probability of ascites is 0.24, and 
the pretest probability of not having ascites is 1.00 - 0.24, or 
0.76. Therefore, the pretest odds of ascites are 0.24/0.76, or 
0.32, and this can be multiplied by 2.8 (generating a posttest 
odds of ascites of 0.90) when the history is positive for ankle 
swelling and by 0.10 (generating a posttest odds of 0.03) 
when this history is negative. 

These posttest odds can then be converted back to proba¬ 
bilities by the formula: 

Posttest probability of the target disorder = 

Posttest odds/(Posttest odds + 1) 

Thus, the posttest odds of 0.90 following from a positive 
history of ankle swelling converts (by 0.90/1.90) to 47%, and 
the posttest odds of ascites of 0.03 following from a negative 
history converts (by 0.03/1.03) to 3%, and you will observe 
that these are the same values for the posttest probability of 
having ascites that we generated in Figure 1-4. 

The necessity for converting probability to odds and back 
again can be obviated by using the nomogram shown in Fig¬ 
ure 1-7, which has already carried out the conversions for us. 1 
You can prove this to yourself as follows: anchor a straight¬ 
edge at the left margin of the nomogram, at the pretest prob¬ 
ability of 24%, and rotate the straightedge until it intersects 
the middle line of the nomogram at an LR of 2.8, corre¬ 
sponding to a positive history of ankle swelling. It will inter¬ 
sect the right margin of the nomogram at just below 50%. 
Similarly, rotate the straightedge until it intersects an LR of 
0.10 for the negative history and observe that the posttest 
probabilityof ascites decreases to 3%. 

The second advantage of LRs becomes apparent when we 
see that the nomogram permits us to determine the probabil¬ 
ity of ascites when the pretest probability changes from 24% 
in Figure 1-4 to 50% in Figure 1-5 without having to con¬ 
struct the latter. We can simply reanchor the straightedge at 
50% and run it across the LRs of 2.8 and 0.10 as before, inter¬ 
secting the posttest probability line at about 73% and 10%. 
The third advantage of LRs is that, unlike sensitivity and 
specificity (which limit the number of test results to just 2 
levels, “positive” and “negative”), they can be generated for 
multiple levels of the diagnostic test result. At each level, the 
proportion of patients with the target disorder at this level is 
divided by the proportion of patients who do not have the 
target disorder at this same level; the result is the LR for this 
level. This is shown in Table 1-2, in which LRs for 4, 3, 2, and 
1 and no positive responses to the CAGE questionnaire are 
shown (the awkward, infinitely high LR for 4 positive 
answers can be avoided if 3 and 4 positive answers are com¬ 
bined, generating an LR of 206 for the combination). 
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Figure 1-6 Likelihood Ratios for a History of Ankle Swelling in 
Diagnosing Ascites 

Characteristics: sensitivity/(1 - specificity) = likelihood ratio (LR) (of having the 
target disorder) for a positive test result = (a/[a + c])/(b/[b + d]) = 0.93/0.33 = 
2.8; (1 - sensitivityj/specificity) = LR (of having the target disorder) for a nega¬ 
tive test result = (c/[a + cj) /(d/[b + d]) = 0.07/0.67 = 0.10. Pretest probability: 
prevalence or pretest probability of having the target disorder, (a + c)/(a + b + 
c+ d) = 15/63 = 24%. Predictions: posttest probability of the target disorder 
(expressed as odds) = pretest probability of the target disorder (expressed as 
odds) x LR for the test result. Positive history, 0.24/0.76 = 0.32 x 2.8 = 0.90/ 
1.90 = 47%. Negative history, 0.24/0.76 = 0.32 x 0.10 = 0.03/1.03 = 3%. 
Adapted from Simel et al. 9 
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Figure 1-7 A Nomogram for Applying Likelihood Ratios 

Adapted from Sackett et al. 1 
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Table 1-2 Multiple Level 
for Alcohol Abuse or Dep 

No. of Positive Answers 
to the 4 CAGE Questions 

s of Responses to the CAGE Que 
endency ab 

Alcohol Abuse or Dependency 

istions 

Likelihood 

Ratios 

Yes 

No 

4 

23 (0.20) 

0 

oo 

3 

37 (0.32) 

1 (0.002) 

127 

2 

28 (0.24) 

14(0.03) 

6.8 

1 

11 (0.09) 

28 (0.07) 

1.3 

0 

18(0.15) 

358 (0.89) 

0.17 

Total 

117 

401 



Abbreviation: CAGE, cut down, annoyed, guilty, eye opener. 

“Adapted from Bush et al . 5 

“Numbers in parentheses are proportions of the respective columns. 
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Figure 1-8 Working Table for the Reader’s Use 
For accuracy 

Sensitivity = a/(a + c); SnNout: when sensitivity is high, a negative test 
result rules out the target disorder 

Specificity = d/(b+ d); SpPin: when specificity is high, a positive test result 
rules in the target disorder. 

Positive predictive value or posttest probability of having the target disor¬ 
der among patients with positive test results, a/(a + b). 

Negative predictive value or posttest probability of not having the target dis¬ 
order among patients with negative test results, d/(c + d). Posttest probability 
of having the target disorder for patients with negative test results, c/(c+ d). 
Prevalence or pretest probability of having the target disorder, (a + c)/(a + b 
+ c+ d). 

Sensitivity/(1 - specificity) = likelihood ratio (LR) (of having the 
target disorder) for a positive test result = (a/[a + c])/(b/[b + d]). 

(1 - sensitivity)/specificity = LR (of having the target disorder) for a 
negative test result = (c/[a + c])/(d/[b + d]). 

Posttest probability of the target disorder (expressed as odds) = pretest 
probability of the target disorder (expressed as odds) x LR for the test result. 
For precision (and k) 

Observed agreement, (a + d)/(a + b + c + d) 

Expected agreement. 

Expected cell a, ([a + b] x [a + c])/(a + b + c + d) 

Expected cell d, ([c + d] x [b + d])/(a + b+ c+ d) 

Calculate expected agreement as (expected a + expected d)/(a + b+c+d)\ 
Agreement beyond chance = k = (observed agreement - expected 
agreement)/(100% - expected agreement) 

Conventional levels otic, slight, 0.0-0.2; fair, 0.2-0.4; moderate, 0.4-0.6; 
substantial, 0.6-0.8, almost perfect, 0.8-1.0. 


The fourth advantage of the LR strategy is that the posttest 
probability of the target disorder obtained from the first item 
of diagnostic information (say, a history of ankle swelling) is 
the pretest probability of that diagnosis for the next item of 
diagnostic information (say, the physical examination for 
ankle edema). This example also identifies the problem we 
always face when we combine diagnostic information from the 
medical history and physical examination (and chemistry lab¬ 
oratory, and radiology suite!): the results of the medical his¬ 
tory and physical examination are not independent from each 
other. Thus, a patient with a positive history of swollen ankles 
is far more likely to have pedal edema than a patient with a 
negative history, and we must either use an LR that considers 
both of the 2 items as a pair or modify the LR for the second, 
according to the results of the first. This issue of independence, 
along with the consideration of the site (primary care or a ter¬ 
tiary hospital) where the examination is carried out, will be 
taken up in a subsequent background article in this series. 

CONCLUSION 

This first background article has described readers’ guides for 
articles about diagnostic information and has shown how diag¬ 
nostic data derived from the medical history and physical exami¬ 
nation can be assessed for their precision and accuracy. It 
concludes with a working table (Figure 1-8) and glossary that can 
be photocopied or clipped. Kept handy, they can help readers 
study and understand the overviews published in this and subse¬ 
quent issues of the series on the rational clinical examination. 
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UPDATED SUMMARY ON PRECISION AND 
ACCURACY OF THE CLINICAL EXAMINATION 

Original Review 

Sackett DL. A primer on the precision and accuracy of the 
clinical examination. JAMA. 1992;267(19):2638-2644. 

WHAT IS THERE TO UPDATE? 

Each of the updates in The Rational Clinical Examination 
systematically evaluates the newly published literature on the 
topic, except this one. Updating the Primer requires a differ¬ 
ent approach to fulfill the original promise that the series 
would address methodologic concerns beyond precision and 
accuracy. What we will do is take a very utilitarian approach, 
driven by the topic updates themselves. The updates and our 
own lectures on the rational clinical examination unearthed 
topics that we need to address. Rather than conducting a sys¬ 
tematic review of quality measures, sensitivity, specificity, 
likelihood ratios (LRs), and a plethora of related topics, we 
instead provide background information and answers to 
questions that our own authors required when preparing 
their reviews and updates. 

Of course, the basic premise for diagnosis has not changed 
since the Primer (or since Thomas Bayes figured it out more 
than 3 centuries ago): 

Prior odds x LR = Posterior odds 

For the clinical examination, this means we (1) use informa¬ 
tion about the probability of a target disorder (frequently taken 
as the prevalence, which is then converted to the prior odds) 
and then (2) apply the results of symptoms or signs (in the 
form of an LR). After applying the LR associated with various 
symptoms and signs, we get the posterior odds of disease. The 
probability of disease increases when a clinical finding is more 
likely in a patient with the target disorder (reflected by an LR 
>1). The probability of disease decreases when a clinical find¬ 
ing is more likely to occur in a patient without the target disor¬ 
der (reflected by an LR < 1). The resultant probability becomes 
the “posterior” probability because the prior probability is 
established first and then modified with information from the 
medical history and physical examination quantitatively 
expressed in the form of the LR.* Keeping the simple equation 


in mind focuses the goal of The Rational Clinical Examination 
series articles on providing all the data needed to solve the pos¬ 
terior odds equation. 

Why LRs? 

In the Primer, we emphasized the role of the univariate LR 
for clinicians. The term univariate means the results for 1 
finding, without regard to the findings of other historical or 
clinical features. We chose this route for a variety of reasons, 
most important being its fundamental property that allows 
clinicians to apply the values to individual patients in a con¬ 
sistent pattern. LRs always convey the same information— 
they quantify the change in odds of disease for a particular 
test result. By tradition for dichotomous test results, we call 
the LR associated with a positive test the LR+ (positive LR), 
whereas the LR associated with a negative test is the LR- 
(negative LR). In either case, the actual LR value is related to 
the change in likelihood that the patient has the disease of 
interest. Thus, there can be no confusion, as is sometimes the 
case when physicians become overwhelmed with how to 
translate positive predictive value, true-positive rate, false¬ 
positive rate, negative predictive value, true-negative rate, or 
false-negative rate into a change in the likelihood of disease 
for an individual patient. 

Many clinicians feel more comfortable with the terms sen¬ 
sitivity and specificity. However, these values in and of them¬ 
selves have little application to the clinical setting. Sensitivity 
and specificity are values that apply to a screening test result 
before we know whether the patient has the target disorder. 
So which result do we use at the bedside? Sensitivity applies 
only to patients with disease, whereas specificity applies only 
to patients without disease. Because we use screening tests 
precisely because we do not know about the presence or 
absence of disease, how do we decide whether the value of 


*Do not be confused by the transition between odds in the equa¬ 
tion and our discussion of probability. The equation requires that 
we use the odds ratio, but clinicians find it easier to think in terms 
of probability. We can covert any probability of disease to the odds 
ratio by the equation odds = probability of disease/probability of 
no disease. After we covert the prior probability to odds and multi¬ 
ply it by the LR to get the posterior odds, we convert the result 
back to the probability of disease by the equation probability = 
odds/(l + odds). 
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Table 1-3 Examples of Symptoms or Signs That Have Results Other 
Than Just “Present” or “Absent” 


Example 

Screening Test 

Multilevel Outcome 

A symptom reported by the 

“Do you have trouble 

“Always” 

patient 

initiating your urine 

“Frequently” 


stream?” 

“Sometimes” 



“Never” 

A sign on the physical 

Is a third heart sound 

Abnormal 

examination 

present? 

Uncertain 



Normal 

Ordinal 8 valued findings 

Deep tendon reflexes 

4+ 



3+ 



2+ 



1 + 


0 


“Ordinal means “ordered.” The results can be ranked, although the incremental value 
has no quantitative meaning. For example, deep tendon reflexes of 2+ are more pro¬ 
nounced but not twice as prominent as 1 + reflexes. 


Table 1-4 Hypothetical Data to Demonstrate How to Describe the 
Results for a Finding With 3 Possible Outcomes 



LV Systolic 
Dysfunction Present 

Normal LV Function 

S3 definitely present 

30 

5 

Uncertain 

5 

10 

S3 definitely absent 

10 

50 


Abbreviation: LV, left ventricular. 


sensitivity or the value of specificity applies to our patient? 
The simple answer is that we do not know. If we do know 
which result applies to our patient, then, by definition, we 
know the disease status, and the results of screening tests lose 
relevance. The true value of an LR comes from its mathemat¬ 
ical definition that combines the values of sensitivity and 
specificity, making it applicable to each patient before we 
know whether disease is present or absent. 

When evaluated in combination, the sensitivity and speci¬ 
ficity are the building blocks of the LR for tests that are 
dichotomous (eg, “positive” or “negative,” “present” or 
“absent”). The LR for a positive result is sensitivity/( 1 - spec¬ 
ificity), whereas the LR for a negative result is (1 - sensitiv¬ 
ity)/specificity. But what happens when a screening test has 
more than 2 outcomes ( ible 1-3)? 

Traditional laboratory tests are measured on continuous 
scales, where the result intervals have a mathematical mean¬ 
ing, but the clinician could not possibly know the LR for 
every outcome. A clinical laboratory reports the raw result, 
along with a designator for whether the result is “high,” “nor¬ 
mal,” or “low.” The report takes the raw value and transforms 
it to an ordinal scale, making it easier for clinicians to review 
a large amount of data. When there are more than 2 out¬ 
comes of a screening test, sensitivity and specificity cannot be 


directly calculated, so the clinician must rely on LRs that are 
usually given for ordinal results. 

A simple quantitative explanation helps explain why the 
sensitivity and specificity lose meaning when there are more 
than 2 screening test results. The presence of a third heart 
sound (S3) suggests left ventricular (LV) systolic dysfunction. 
Sometimes, the clinician is uncertain whether the sound is 
present. To illustrate this point, we can make up some data 
that might apply to the clinician’s interpretation of the S3 
compared with a reference standard echocardiogram that 
quantified the LV function ( ible 1 ). 

We can describe the sensitivity of the S3 as 30/(30 + 5 + 
10) = 0.68 and the specificity as 50/(5 + 10 + 50) = 0.77. 
Although this may seem straightforward, closer inspection 
reveals some problems with that interpretation. First, the 
treatment of the “uncertain” results lacks consistency. For 
calculating the sensitivity, we “count” an uncertain S3 as if 
it were actually absent. But the clinical reality was that the 
physician could not state with certainty whether it was 
present or absent. When we calculate the specificity, we do 
the exact opposite and count the “uncertain” outcomes as if 
they were “positive.” How can one “uncertain” finding be 
considered “positive” for sensitivity but “negative” as speci¬ 
ficity? This dual treatment creates problems that become 
even more pronounced as the number of results increases 
beyond 3 outcomes. 

Second, even if we believed that the sensitivity and spec¬ 
ificity captured the meaning of an S3 that is either present 
or absent, how do we describe the results for “uncertain?” 
Sensitivity provides an inadequate definition because sen¬ 
sitivity is the value that describes the percentage of 
patients with an abnormal result among all those with dis¬ 
ease and “uncertain” is neither abnormal nor normal. A 
similar argument applies to the specificity, so that neither 
sensitivity nor specificity offers a reasonable description 
of the value of an uncertain result. The constructs just do 
not apply to a test result that is neither completely normal 
nor completely abnormal. The LR provides a way to 
describe not only the positive and negative results but also 
those that are uncertain. 

At a fundamental level, the LR takes a given screening test 
result and for that outcome tells us the ratio of those with 
disease to those without disease. So once we know which 
row of the table a patient belongs in according to their test 
result (S3 present, S3 uncertain, or S3 absent), the LR tells 
us the likelihood that the patient will come from the first 
column vs the second column. We can calculate an LR for 
every row of an r x 2 table (where r represents the number 
of rows) ( ble 1 ). 

Thus, when we hear an S3 in the patient, we apply the value 
8.7, which makes LV systolic dysfunction much more likely. 
When we feel confident that an S3 is absent, the likelihood of 
LV systolic dysfunction decreases. However, when we are 
“uncertain,” the LR we apply is 0.72, a value that approaches 1 
and suggests that the “uncertain” result should not have a large 
effect on our estimate of the likelihood of disease. Oftentimes, 
it is useful to know that “uncertain” really means “not much 
information” with an LR approaching 1. 
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Isn’t All the Information in the Patient’s Medical History? 

We now need to address a common belief that the physical 
examination is not particularly helpful and, at best, only con¬ 
firms the historical findings and symptoms. Oftentimes, a clini¬ 
cian takes a patient’s medical history and makes a diagnosis 
before performing a physical examination. This process, 
although sometimes successful, leads to the inference that the 
physical examination was unnecessary. For a simple reason, the 
inference is not true: the physical examination begins from the 
moment the clinician meets a patient and before the patient 
utters a word! We observe body language, the patient’s gait, vital 
signs (eg, tachypnea), and physical deformities, and we judge 
the acuity of illness. These findings derived from visual observa¬ 
tions may be hard to quantify (eg, a sense that the quiet, sullen 
patient might be depressed), although most clinicians recognize 
the huge amount of information they collect in the first few 
moments of a patient interaction. Because describing and mea¬ 
suring the influence of our overall observations is difficult, 
researchers often overlook the clinical gestalt. 

One way of isolating the clinical gestalt is to evaluate whether 
we can make a diagnosis in the absence of directly observing a 
patient. A symptom checklist (but not the patient’s medical his¬ 
tory) can be obtained through a completed patient self-adminis¬ 
tered questionnaire. Sometimes, we can infer a diagnosis from 
such questionnaires with our impression uncontaminated by 
physical findings, but the diagnosis typically requires confirma¬ 
tion obtained through a patient interview or physical examina¬ 
tion. The ability to disentangle the history from the physical 
examination findings is often an illusion, leading to the inference 
that the patient’s medical history (symptoms) dominates the 
clinical diagnostic process over the physical examination (signs). 

The Pretest Probability 

The most important part of the clinical examination and the 
resulting diagnosis is typically not the symptoms or signs—it 
is the pretest probability, transformed to the prior odds, that 
dominates the equation. Simply put, if a condition is highly 
unlikely (or vice versa), then the presence or absence of any 
addition findings will typically not change things. As a corol¬ 
lary, when the probability of a target condition is not so cer¬ 
tain, the effect of the signs and symptoms on the prior 
probability creates a potentially bigger effect. 

So, where does the pretest probability come from? We estab¬ 
lish the pretest probability in the course of our clinical examina¬ 
tion, and that creates a bit of a problem (for both researchers 
and clinicians). In other words, as we learn more about the 
patient’s medical history, symptoms, and signs, we orient our 
approach to a narrower spectrum of disease possibilities. This 
approach requires that we “waste” a few findings to establish the 
pretest probability. For example, most patients we examine do 
not have sinusitis, and we do not ask questions about symptoms 
related to sinusitis, nor do we transilluminate the sinuses during 
the course of a clinical examination unless we have a suspicion 
of the disease. We might constrain our evaluation for sinusitis to 
patients who claim nasal stuffiness, nasal discharge, or maxillary 
facial discomfort or who come right out and state, “I think I 
have a sinus infection.” Each of these findings would prompt an 


Table 1-5 A Likelihood Ratio Can Be Calculated for Each Row of 
an rx 2 Table as Shown With These Hypothetical Data 


LV Systolic 

Dysfunction Normal LV 
Present Function LR a 


S3 present 

30 

5 

(30/45)/(5/65) = 8.7 

S3 uncertain 

5 

10 

(5/45)/(10/65) = 0.72 

S3 absent 

10 

50 

(10/45)7(50/65) = 0.29 

Total 

45 

65 



Abbreviations: LR, likelihood ratio; LV, left ventricular. 

a By convention, for LR values 0-1, we round off to the 10Oths; for LR values 1 -10, we 
round off to the tenths; and for LR > 10, we round off to the nearest integer. 


appropriate evaluation for sinusitis and in a research study cre¬ 
ate the “entrance criteria.” Thus, when we refer to the pretest 
probability of sinusitis, we most likely are referring to the preva¬ 
lence of sinusitis among patients with any of those findings 
rather than to the prevalence of sinusitis among all patients in 
general. This pretest probability becomes the value we use in the 
equation and the anchor for applying other symptoms and signs 
we uncover during our clinical examination. 

The establishment of the pretest probability is the problem 
most learners fear, representing their main “excuse” for not 
using the concepts in The Rational Clinical Examination. Fre- 
quendy, learners claim “lack of experience.” When existing stud¬ 
ies adequately describe their study population, the pretest 
probability is not difficult to understand. Experience becomes 
more valuable when the literature is less clear, and perhaps this 
is part of the “art” of the clinical examination. Trainees may be 
quite good at estimating the pretest probability of common con¬ 
ditions. However, both trainees and experienced clinicians tend 
to overestimate the prior probabilities of less common diseases. 
Trainees express discomfort when estimating the prior probabil¬ 
ity because (1) they do not practice quantifying and then vali¬ 
dating their clinical impression and (2) they may recall their 
own cases in which they pursued an unlikely diagnosis for a 
seemingly “classic” presentation, only to find that the disease 
was not present. Although the second reason emanates from 
overlooking the importance of prior probability, it requires a 
reassessment of the role of symptoms and signs. 

What Is a “Good” Symptom or Sign? 

The presence of a “good” symptom or sign creates a large 
effect on the probability, convincing the clinician that the 
target condition is much more likely to be present than the 
prior probability suggests. The suggestion that some prespec¬ 
ified LR threshold defines a good clinical finding for all dis¬ 
ease is a myth so persistent that it represents a medical urban 
legend. Some researchers and clinicians define a “good” test 
result as that associated with an LR greater than 10 or an LR 
less than 0.1, but these results do not have intrinsic proper¬ 
ties that are the sine qua non of high value. For example, a 
pretest probability of 10% and positive test with an LR = 10 
generates a posttest probability of 53%; this is a big increase 
in the probability of disease but hardly an increase that 
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clinches the diagnosis. Furthermore, this is a similar posttest 
probability that follows from a disease with a pretest proba¬ 
bility of 20% and a positive test with an LR = 5. Thus, 
although positive test results are increasingly powerful as the 
LR increases and negative results are increasingly valuable as 
the LR decreases, the efficiency of the finding in making a 
diagnosis depends on the pretest probability. 

When considering that multiple symptoms and signs are 
interpreted together, individual findings with much less 
impressive LRs alone (eg, LR+, 2-5; or LR-, 0.25-0.50) could 
prove useful when used in combination. If no LR threshold 
automatically qualifies a result as good, is there a way to com¬ 
pare the efficiency of different clinical findings? 

A positive clinical finding with the highest LR+ or a negative 
finding with the lowest LR- will always have the greatest effect 
on posttest probability. Unfortunately, clinicians discover that a 
list of symptoms and signs for an individual patient sometimes 
simultaneously yields outcomes both suggesting (positive 
results) and pointing away from (negative findings) a target dis¬ 
order. There is a way, though, to make sense of this. Rank order¬ 
ing the LR+ associated with each result, along with the 
reciprocal of the LR- (1/LR-), reveals the single “best” clinical 
finding for a target condition. The value with the highest LR+ or 
1/LR- is the single best symptom or sign result. A single symp¬ 
tom or sign may be useful when present (high LR+) or absent 
(small LR-). Unfortunately, most symptoms and signs will not 
produce both the best findings when positive and also the best 
when it is negative. For example, a clinical sign may have a low 
LR- when negative, whereas a positive result may have an LR+ 
that approaches 1. Creating a mental fist of LR and 1/LR- for a 
variety of symptoms and signs is not easy. Some clinicians want 
to identify the single finding that overall is the most likely to give 
them the right answer (ie, positive when the patient has disease 
and negative when the patient is not affected). 

The diagnostic odds ratio (DOR) creates a single measure 
of accuracy that tells us which symptom or sign is most likely 
to correctly classify a patient as having the target disorder or 
not. 1 The DOR is not difficult to calculate, as the DOR = 
LR+/LR-. The more accurate the symptom or sign, the 
higher the DOR. So when faced with a table of data on many 
clinical findings in which none distinguishes itself as the 
overwhelming favorite, the clinician should choose the find¬ 
ing with the highest DOR. Unfortunately, the DOR cannot 
be used like the LR for estimating the probability of a diagno¬ 
sis, but it can help us choose the symptoms and signs of 
higher utility so that we can ignore those of lesser value. At 
this point, the skeptical reader might accept that there is a 
method for identifying better symptoms and signs in terms 
of their overall measurement properties (through the DOR) 
and better results applicable to individual patients (through 
the LR). However, a remaining question might be, How con¬ 
fident can I be that the symptoms and signs I think are the 
best really are the best? 

The Confidence Interval 

When The Rational Clinical Examination series began, we 
presented likelihood results as single point values as if they 


completely described a clinical finding—they do not. Like all 
statistical parameters, an LR has an associated confidence 
interval (Cl) that helps us decide whether the data are suffi¬ 
cient for us to infer usefulness. These CIs are important 
because they provide transparency. An optimistic LR sug¬ 
gests a promising clinical finding, but a broad Cl dampens 
the enthusiasm by implying that a small sample size accounts 
for some certainty. We are particularly cautious when the 
95% Cl includes 1 because LR values of 1 add no informa¬ 
tion to the pretest probability. Broad CIs around LR-, even 
when they do not include 1, are a particular problem. 
Because the LR- values are constrained between 0 and 1, a 
broad Cl seems less of a problem than the broad Cl around a 
high LR+. To compare the relative findings, the clinical 
reader can use the technique we described above (ie, taking 
the value 1/LR-) for comparing the breadth of the CIs of 
negative to positive LRs. 

Some readers will be surprised that there are different 
methods that yield slight (but clinically unimportant) differ¬ 
ences in CIs. We prefer the easiest computational method 
that also works well in spreadsheets. 2 One situation presents 
problems for researchers and clinical readers alike: what do 
we do when one cell of the 2x2 table is 0? When any single 
cell has a 0 value (typically, the cells for false positive or false 
negatives), adding 0.5 to each cell of the 2x2 table allows 
calculation of useful CIs. 3 A sensitivity of 100% yields an LR- 
of 0, with the LR upper 95% Cl obtained after adding 0.5 to 
each cell. A specificity of 100% yields an LR+ that is not cal¬ 
culable (°°), so we report both the LR+ and Cl obtained after 
adding 0.5 to each cell. Although high-quality studies report 
both the sensitivity and specificity of clinical findings, not all 
of them calculate the LRs for us. When researchers provide 
the actual numbers of affected and unaffected patients, 
together with the sensitivity and specificity, we can generate 
the LRs and 95% CIs. Although it is sometimes easy to calcu¬ 
late CIs from individual research reports, meta-analysis 
offers us an even better way of describing the LRs of findings 
evaluated across several studies. 

Meta-analysis 

Meta-analysis of symptoms and signs combines the results 
described across several studies and summarizes them to get 
a single estimate and CI. Although some statisticians have a 
high degree of skepticism about the appropriateness of com¬ 
bining LRs, we take the position that summarizing results 
provides clarity for clinicians that at the very least allows 
them to assimilate data and decide whether a symptom or 
sign is useful, useless, or uncertain. 

An important part of meta-analysis requires the investiga¬ 
tor to make decisions about the appropriateness of combin¬ 
ing data. Although statisticians often suggest a purely 
statistical approach (ie, studies that have statistically hetero¬ 
geneous results should not be combined), we take a more 
pragmatic approach similar to that espoused by other clinical 
diagnosticians. 4 First, we evaluate whether the universe of 
published studies represents the universe of patients for 
whom the target condition might be considered. When the 
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studies reflect the population of patients for whom the 
symptoms and signs apply, we prefer to try combining the 
LRs. On the other hand, when studies use various definitions 
of disease or different thresholds for the symptoms and signs, 
we cannot combine the results in a meaningful way. When 
we cannot combine the results, we present ranges for the LRs. 
Second, we consider our target audience to be clinical read¬ 
ers. For a condition that might have a very different LR 
among different populations of patients (eg, findings for 
appendicitis among children vs geriatrics patients), we avoid 
combining results or we at least show how they vary. Part of 
this approach requires common sense, and part of this is sta¬ 
tistical, in which we examine the outlier results to deduce 
whether there is anything recognizable that accounts for the 
variant LR findings. Third, we examine the actual results 
with their CIs after we combine the data. We always use ran¬ 
dom-effects measures for generating the LR and CIs, rather 
than the fixed-effects approach. Random-effects measures 
generate broader CIs than the fixed effects, providing at least 
some assurance that we are not overstating the importance 
and confidence in our findings. If a study is a statistical LR 
outlier, we still include it in the combined data if it does not 
make a large clinical difference in the LRs. We suggest that 
the clinician use clinical judgment when deciding whether 2 
LRs yield clinically important differences in the posttest 
probability. For example, for a pretest probability of 30%, an 
LR of 5.4 produces a posttest probability of 70%, whereas an 
LR of 3.5 produces a posttest probability of 60%. These LRs 
“look” different, but a clinician might take a similar action 
for a posttest probability of 70% vs 60%. Thus, the 2 LRs 
could be statistically different but provide clinically similar 
results. We always provide the results from each study, and 
astute readers can decide from the point estimates and CIs 
whether they believe a finding is useful or useless. 

More statistically experienced readers may recognize that 
meta-analysis of LRs differs from what they expect. Statisti¬ 
cians, when they accept meta-analysis of diagnostic tests at 
all, prefer summarizing the DOR as a global measure of test 
performance. We take a different approach because summa¬ 
rizing the DOR gives clinicians a value that they cannot use 
for individual patients. Although we do sometimes provide 
summary measures of the DOR, the summary measures of 
the prevalence of disease (pretest probability) and the LR are 
the values needed for solving the equation for posttest proba¬ 
bility. Sometimes, we encounter studies that only provide 
sensitivity data. What do we do with studies that are case 
series of patients with disease and that do not have specificity 
values? 

“Sensitivity-Only” Studies 

When conditions are less common, investigators recognize 
that enrolling consecutive patients at risk for the target disor¬ 
der creates a study population overwhelmed by those with¬ 
out disease. This approach is costly and takes time, and the 
small number of patients with disease leads to broad CIs 
around the sensitivity and LR-. The alternate approach of 
studying only patients with disease so that sensitivity can be 


defined is pragmatic, and it may be the best the investigator 
can do. These studies typically come from a narrow spectrum 
of diseased patients, and often, the clinical finding is 
recorded among patients when the clinician knows that dis¬ 
ease is present. In addition to understanding the potential 
biases in the data, we must understand the inferences made 
from the sensitivity of symptoms and signs without specific¬ 
ity values. The goal of sensitivity studies is to identify a group 
of symptoms and signs that would unlikely all be negative in 
a patient with the target condition. 

Symptoms and signs with high sensitivity are less likely to be 
negative in patients with disease. When presented with sensi¬ 
tivity data by itself, clinicians will count the number of absent 
findings in their patients and deduce that those with normal 
findings on multiple high-sensitivity symptoms and signs will 
be unlikely to have disease. For example, suppose we identify 2 
symptoms and 1 sign, each of which has a sensitivity of 85% 
for the target condition. That means that each finding would 
be absent in 15% of patients with disease; all 3 would be absent 
in fewer than 1% of patients (0.15 x 0.15 x 0.15). 

How Do We Use All the Symptoms and Signs? 

Among several reasons for preferring LRs as our common 
statistical parameter, rather than the individual sensitivity 
and specificity values, the ability to multiply likelihood 
results from several findings is the most alluring. Unfortu¬ 
nately, a crucial assumption is not often fully addressed— 
sequentially multiplying LRs requires that the symptoms and 
signs be independent of one another. 

Let us explain the independence concept with a simple 
example. Suppose you conduct a study of chest pain symp¬ 
toms as a predictor of acute ischemia and you categorize words 
as having “physical” or “emotional” connotations. Words that 
describe location and radiation would be physical (eg, “center 
of the chest,” “in the neck”), whereas words that describe the 
interpretation of pain would be emotional (eg, “suffocating,” 
“crushing”). You decide to record whenever a patient refers to 
an “elephant” in describing their discomfort as emotional as 
in, “It felt like an elephant stepped on my chest.” We suspect it 
is obvious that a patient who is “elephant-positive” is experi¬ 
encing crushing pain, but if they report they are having 
“crushing pain that feels like an elephant on my chest,” should 
we report the findings separately for “crushing positive” and 
“elephant positive?” Multiplying the LRs together for “crush¬ 
ing,” “elephant-like” discomfort probably overstates the impor¬ 
tance, producing a posttest odds ratio that is too high because 
elephant-like pain is not independent of crushing pain. 
Although common sense might work as an initial judge of 
independence, common sense should not be the only arbiter 
of independence. What should you do when presented with an 
array of findings for many symptoms and sign without any 
assessment of independence? 

To make teaching and performing the medical history and 
physical examination more efficient and accurate, we want 
parsimony. By “parsimony,” we mean the fewest number of 
symptoms and signs that yield the most accurate informa¬ 
tion. Parsimonious examinations force teachers to teach only 
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the most relevant parts of the examination, allowing students 
to spend more time learning what is important while elimi¬ 
nating wasteful maneuvers. Of course, some of this waste is 
in eliminating maneuvers that do not work well. For exam¬ 
ple, a Rinne test is interesting to teach, but it does not add 
useful diagnostic information to the symptom of “decreased 
hearing” reported by the patient. 5 We eliminate additional 
wasted effort when we discard nonindependent findings. 

A parsimonious examination should mathematically make 
us more accurate because a “complete” medical history and 
physical examination almost certainly produces nonindepen¬ 
dent findings. “Positive” nonindependent findings confuse us 
and distort our probability estimates, typically making us 
infer a higher probability of disease than is justified. Most 
authors of The Rational Clinical Examination articles 
emphasize no more than 3 to 4 findings, even when addi¬ 
tional symptoms and signs have useful LRs. Narrowing down 
the number of recommended findings requires “face valid¬ 
ity,” by which we mean using common sense to recommend 
the items with the best, seemingly independent LRs. When 
we take this approach, experienced clinicians then use semi- 
quantitative reasoning and deduce that the more findings 
present, the more likely the patient has disease (or vice 
versa). 

When clinicians want to incorporate the results of diag¬ 
nostic studies into their decision making, they can take 3 
approaches to prevent errors created by lack of indepen¬ 
dence. 6 Performing the clinical examination and then using 
only one single history or physical examination finding to 
adjust the prior odds will guarantee there is no problem with 
independence. (Of course, it also guarantees that the clini¬ 
cian might be ignoring a lot of useful clinical information!) 
Typically, the clinician will want to use the single finding that 
has the greatest effect on the prior odds, or the “best” finding 
that we described earlier. The approach is not difficult since 
simple math allows you to rank the findings in order from 
most useful to least useful. Suppose you have 3 findings (A, 
B, and C) that can each be positive or negative, with the LRs 
associated with each result shown in ’able 1-6. Is the finding 


Table 1-6 The Findings With the Biggest Influence Can Be Found by 
Rank Ordering the LR+ and LR-“ 

Finding 

LR 

LR for Values > 1 and 
1/LR for Values < 1 b 

A present 

15 

15 

C absent 

0.1 

10 

B present 

5.0 

5.0 

C present 

2.0 

2.0 

B absent 

0.6 

1.7 

A absent 

0.9 

1.1 


Abbreviations: LR, likelihood ratio; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“Adapted from Holleman and Simel. 6 

“For LRs < 1.0 (usually the LR-), the reciprocal (1 /LR) is used. 


that “A” is present more diagnostically useful than “C’s” 
absence? To determine this, you can rank order these by 
comparing the LR for the positive results to 1/LR for the neg¬ 
ative results. Table 1-6 shows the relative value each of the 
findings. If your patient had “A” absent, “C” present, and “B” 
present, then you would multiply the prior odds by the LR 
associated with the outcome for test “B” (LR = 5.0) because it 
had had the most useful outcome for that individual. 

Although the above result removes any concerns with 
independence, the clinician must collect many data that ulti¬ 
mately are discarded. At the very least, it is not efficient, and 
at the worst, important information could be ignored. Not 
surprisingly, this approach lacks appeal because it ignores the 
way most clinicians incorporate many bits of information 
into their decision making. 

Clinical researchers must analyze their data in a multivariate 
way to help clinicians. By “multivariate,” we mean that they 
must analyze combinations of findings so that there is less con¬ 
cern about independence. This can involve one of 2 general 
approaches. The easiest approach is to take the medical history 
and physical examination findings and perform logistic regres¬ 
sion. Logistic regression takes a number of individual variables 
and determines their importance in predicting whether disease 
is present or absent. In the first strategy for assessing indepen¬ 
dence, logistic regression identifies variables that lack indepen¬ 
dence and that can be eliminated as redundant. In our example 
above, if all patients with wheezing were also dyspneic, then the 
finding on the “variable” dyspnea might be unimportant once 
we know the wheezing status. The logistic regression approach 
would identify this as being nonsignificant, and the investigator 
would suggest we concentrate our efforts at assessing for 
wheezing. Used as a “data-reduction” step to achieve parsi¬ 
mony, the clinician would use the simple, univariate LRs for 
any finding identified as being independently useful in the 
logistic model. This approach has a lot of appeal because it 
identifies the important and useful variables for the clinician, 
and it does not require that they understand the logistic model 
itself, because the univariate LRs are used. However, in using 
the simple, unadjusted LRs, we ignore the relationship between 
the various clinical findings in favor of simplicity. 

The (3 parameters of a multivariate logistic analysis 
describe the relative importance of symptoms and signs. 
From algebra, you might remember the equation for a 
straight line is y — mx + b. The m in the equation is the slope, 
and it quantifies how a change in x affects y* A logistic 
model works similarly, except that now, rather than having 1 
x, we have several symptoms and signs that we evaluate all at 
once. The equivalent of m in the logistic model now repre¬ 
sents the p parameter, which is the odds ratio associated with 
each symptom or sign; the higher the p parameter, the more 
important the finding. When investigators provide us the 
actual multivariate models, we can put the results of our own 
patient’s clinical examination into the model, and the out¬ 
come is the individual patient’s actual probability of disease. 


*For those who just cannot remember b, it is the intercept where the 
line crosses the y-axis. 
















CHAPTER 1 Primer on Precision and Accuracy 


The Fuss About Precision 

The Primer states, “for an item of the clinical history or phys¬ 
ical examination to be accurate, it first must be precise.” By 
precision, we imply that 2 or more observers agree on the 
presence or absence of a finding in a patient who experienced 
no clinical changes.* 

When we measure precision, describing the percentage of 
time that 2 observers agree on a symptom or sign fails to 
consider simple luck. Instead of reporting simple agreement, 
investigators report precision as the agreement beyond that 
attributable to chance. For dichotomous findings (“yes” vs 
“no” or “present” vs “absent”) compared between 2 observ¬ 
ers, we quantify this agreement beyond chance with the K 
statistic^ The K statistic varies from -1 (perfect disagree¬ 
ment) to 0 (chance agreement) to +1 (perfect agreement). 

Suppose we are interested in whether a third heart sound 
identifies patients with LV systolic dysfunction. It is easy to 
imagine that a cardiologist might be better at identifying this 
correctly than a generalist internist, suggesting that a K statis¬ 
tic might show lower agreement beyond chance than if we 
were comparing 2 general physicians. Should we conclude 
that a third heart sound is not a good test from the precision 
between a cardiologist and a general internist? The answer, of 
course, is no because test accuracy depends on the quality of 
the observation—the cardiologist might be a better observer 
than a less experienced clinician. These seemingly imprecise 
symptoms and signs are potentially useful when certain pro¬ 
viders get consistently good results because they represent 
opportunities for improved performance and accuracy. 

A second type of precision is more important for identifying 
inaccurate findings. Although a low K between observers 
points to opportunities for improving, poor intraobserver 
agreement precludes high accuracy unless the problem can be 
eliminated. Intraobserver agreement describes whether a clini¬ 
cian gets the same result when assessing a symptom or sign on 
a patient who is clinically unchanged. For example, when a cli¬ 
nician inquires about unilateral headaches as a symptom for 
migraines but the patient changes his or her answer, the find¬ 
ing can never be accurate or precise. Although the natural 
assumption might be to blame the patient for inconsistency, 
part of poor intraobserver agreement may be attributable to 
poor technique that can be improved. This is true even when 
applied to symptoms as reported by the patient because differ¬ 
ent answers follow when the information is solicited differ¬ 
ently (eg, asking the patient a leading question about unilateral 
headaches vs an open-ended question). But if clinicians can¬ 
not assure reliability on their own findings, they will never use 
the symptoms and signs accurately. If you cannot agree with 
yourself, the LR results will be random. 


*To clarify further, some researchers use the word reliability or the 
term observer variability instead of precision. These are all terms that 
imply the same concept of similar results on repeated examinations, 
so we use them interchangeably. 

fyVe use the weighted K when we have findings that are not dichoto¬ 
mous. For example, a sign graded as 0,1, or 2+ would have a dis¬ 
agreement between observers of “grade 1 and 2” weighted as less 
than a discrepancy between “grade 0 and 2.” When we have multiple 
observers, we use regression techniques to generate the intraclass 
correlation coefficient for describing the interobserver variability. 


A Brief Word About Quality 

Every article in The Rational Clinical Examination series and 
the updates in this book use a standard process for assessing 
the quality of data. Although the Primer focuses mostly on the 
sensitivity, specificity, and LR results, it should be clear that 
narrow CIs around the results do not assure methodologic 
rigor of the studies that generated the results. At the inception 
of The Rational Clinical Examination series, the evidence- 
based medicine movement was in its infancy. An early article 
in the series heralded its entry into the mainstream thoughts of 
clinical educators and investigators. 7 Because standardized 
approaches had not been developed for assessing the quality of 
the medical history and physical examination, David L. Sack- 
ett, MD, and Charles H. Goldsmith, PhD, agreed on certain 
characteristics that they asked their reviewers to use when 
judging quality. The criteria were simplified and summarized 
in an early article of the series. 8 Subsequently, several groups 
have published their criteria for the review of diagnostic accu¬ 
racy studies, although none address the particular nuances of 
symptoms and signs. 911 Perhaps it is not surprising that many 
clinical investigators and epidemiologists have reported on a 
large number of quality measures that describe what seem like 
innumerable potential biases in diagnostic test studies. Despite 
the increasing complexity of rating systems and quality mea¬ 
sures, the original criteria for reviewing articles have stood the 
test of time and pragmatism. If anything, we made the process 
easier and reduced the number of quality levels a reviewer 
might assign an article. We reviewed the recommendations for 
diagnostic test studies 9,10 and adapted them specifically for 
studies of the clinical examination. 12 In the early articles 
appearing in The Rational Clinical Examination series, we 
assigned Grades for levels of evidence. However, this blurred 
the distinction between Levels 3, 4, and 5. Because no study 
accepts Level 5 evidence in making recommendations, we 
dropped the Grade designation and now report only the Levels 
as shown in 'able 1-7. 13 


Table 1-7 Levels of Evidence 3 

Level of 
Evidence 

Grade 

Definition 

1 

A 

Independent blinded comparison of sign or symptom 
results with a criterion standard of diagnosis among a 
large number of consecutive patients suspected of 
having the target condition 

2 

B 

Independent blinded comparison of sign or symptom 
with a criterion standard of diagnosis among a small 
number of consecutive patients suspected of having 
the target condition 

3 

C 

Independent blinded comparison of sign or symptom with 
a criterion standard of diagnosis among nonconsecutive 
patients suspected of having the target condition 

4 

C 

Nonindependent comparison of sign or symptom with 
a criterion standard of diagnosis among samples of 
patients who obviously have the target condition plus, 
perhaps, normal individuals 

5 

C 

Nonindependent comparison of sign or symptom with 
a standard of uncertain validity 


“Modified from Holleman and Simel. 13 
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Table 1-8 Hypothetical Data in Which Only the Patients Who Received 
Neuroimaging Appear in the Published Report 


Target Condition 


Finding 

Present 

Absent 


Present 

90 

10 

LR+ = 9.0 

Absent 

10 

90 

5 

i 

ii 

O 

Sensitivity = 0.90 

Specificity = 0.90 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 


Most of the important biases that compromise a study’s 
results follow from the study population not being consecutive, 
prospective, or independently assessed with an appropriate 
blindly applied reference standard. By consecutive, we mean 
that the authors enrolled all patients for whom the target disor¬ 
der was a reasonable consideration. Independent means that the 
symptom or sign under study was not used to select patients for 
the study. Blind means that the symptoms and signs were 
applied without knowledge of the presence of disease deter¬ 
mined by the reference standard, but also that the reference 
standard was interpreted without knowledge of the study ques¬ 
tions. The size of a study (level 1 vs level 2) for quality assess¬ 
ment depends on the disease under consideration. The authors 
of The Rational Clinical Examination evaluate sample sizes 
according to their review of the literature because there is no 
uniform number that determines quality; for example, a large 
study of thoracic aortic aneurysms might likely not have as 
many patients as a large study of urinary tract infection in 
women. 

One particular bias, verification bias, deserves special consid¬ 
eration because it can be insidious and have a big effect on the 
LR. Verification bias occurs when all the potentially eligible 
patients fail to undergo confirmation of their disease status. 
Often, this is done for pragmatic reasons. An example might be 
a study of headache patients that seeks to describe whether 
asymmetric neurologic findings (eg, weakness) indicating seri¬ 
ous intracranial abnormalities were discovered through neu¬ 
roimaging. Because it would be expensive and impractical to 
have every patient with headaches undergo imaging, an investi¬ 
gator typically chooses to maximize the chance of finding 
something by including all patients with asymmetric muscle 
strength but only a sample of those who are normal. We can 
highlight the effect of verification bias on the sensitivity, speci¬ 
ficity, and LRs, through examining tables of example data. Sup¬ 
pose an investigator reports the findings displayed in >le 1-8. 

In the example, the finding looks excellent, with a sensitivity 
and specificity of 90%. However, because the investigator could 
not justify the reference standard (eg, neuroimaging on every 
patient with a headache), the investigative team referred only a 
sample of those with positive clinical findings (for illustrative 
purposes, 10%). Had the investigator been evaluating every 
patient, the findings might have been as shown in 1-9. 

The data demonstrate that verification bias tends to over¬ 
estimate sensitivity while underestimating specificity.* When 


Table 1 -9 Hypothetical Data, Adjusted for the Patients Who Did Not 
Receive Neuroimaging 

Target Condition 

Finding Present Absent 

Present 90 10 LR+ = 43 

Absent 10/0.10 = 100 90/0.10 = 900 LR- = 0.53 

Sensitivity = 0.47 
Specificity = 0.99 

Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

the bias is left unadjusted, the investigator will not recognize 
that the presence of the finding is actually better than sug¬ 
gested (the adjusted LR+ should be higher), whereas the 
absence of the finding is not as good as suggested (the 
adjusted LR- should be closer to 1). Astute investigators will 
recognize that if they collect complete data on all the poten¬ 
tially eligible patients, the bias is one of the few in diagnostic 
test research that can be mathematically corrected. 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have 

Abdominal Aortic 
Aneurysm? 

Frank A. Lederle, MD 
David L. Simel, MD, MHS 


CASE 1 A 60-year-old man requests a physical exami¬ 
nation because a friend recently died suddenly from a 
ruptured abdominal aortic aneurysm (AAA). Your exami¬ 
nation reveals nothing abnormal. After reassuring the 
patient, you are left wondering whether you might have 
missed an AAA large enough to warrant surgical repair. 

CASE 2 A thin 80-year-old woman observes that she can 
feel her abdomen pulsating against her belt. While exam¬ 
ining her abdomen, you find an easily palpable, strongly 
pulsating aorta that you measure to be about 2 cm wide. 
You wonder whether you should order an ultrasono¬ 
graphic examination. 

CASE 3 You are asked to see a 75-year-old man with 12 
hours of right flank and abdominal pain, constipation, uri¬ 
nary frequency, urgency, dysuria, and leukocytosis and who 
is about to be sent home on treatment for pyelonephritis. 
Deep palpation of the abdomen is difficult, but you faintly 
discern a large pulsatile mass. You order computed tomog¬ 
raphy, which confirms an AAA with bleeding into the retro- 
peritoneum, and the patient is taken to the operating room. 


WHY IS PHYSICAL DIAGNOSIS 
OF AAA IMPORTANT? 


Abdominal aortic aneurysms cause more than 10 000 deaths 
each year in the United States, 1 and many of these deaths 
should be preventable through timely diagnosis and treat¬ 
ment. AAAs usually remain asymptomatic while slowly 
enlarging during a period of years or even decades. About a 
third will eventually rupture, an event associated with a mor¬ 
tality rate of 80%. 2 Important risk factors for AAA include 
age, male sex, and smoking. 3 

Abdominal palpation was the original method of AAA 
detection. When ultrasonography and computed tomogra¬ 
phy became available, it was clear that they were more accu¬ 
rate than palpation, and these became the procedures of 
choice for confirming the diagnosis of AAA and for mea¬ 
surement of AAA diameter. A variety of studies have shown 
the sensitivity and specificity of ultrasonography and com¬ 
puted tomography to be close to 100%. 4 ' 8 Since then, the 
importance of abdominal palpation has been limited to 
identifying patients who should have confirmatory imaging 
studies. In one recent report, 31% of all AAAs diagnosed at a 
university hospital were originally detected by routine phys¬ 
ical examination. 9 

The first scenario addresses the issues of screening (or case 
finding) to detect AAA and the subsequent management of 
asymptomatic AAA, 2 subjects of considerable debate in 
recent literature. Although most of the discussion of screen- 
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ing has focused on the use of ultrasonography, the only study 
to consider both methods found screening with abdominal 
palpation to be more cost-effective. 10 In a review of the peri¬ 
odic physical examination, abdominal palpation for AAA 
was one of the few maneuvers recommended for older men. 11 
The Canadian Task Force on the Periodic Health Examina¬ 
tion observed that abdominal palpation of men older than 60 
years was prudent, 12 but both the Canadian and the US Pre¬ 
ventive Services Task Forces gave each AAA screening 
method a C rating (poor evidence to include or exclude from 
the periodic health examination), and some authors have 
judged the accuracy of abdominal palpation for AAA to be 
insufficient for screening. 13 

Management is based on observations that the risk of AAA 
rupture (and hence the need for elective repair) increases 
with the diameter of the aneurysm. The diameter of asymp¬ 
tomatic AAA above which repair should be offered to good 
surgical candidates is the topic of ongoing clinical trials, 14 
and current recommendations range from 4.0 to 6.0 cm, 
with 5.0 cm as the cutoff point most commonly used. 15 
Patients with AAAs that do not yet warrant repair are fol¬ 
lowed up with ultrasonography once or twice a year to detect 
enlargement that might warrant repair. 

The second scenario represents what has been termed the 
students’ aneurysm. 16 Realization that these symptoms and 
physical findings are normal allows the physician to provide 
immediate reassurance to the patient and makes further test¬ 
ing unnecessary. 

In the third scenario, abdominal palpation may have been 
lifesaving. Physical examination should not be relied on to 
rule out the diagnosis of ruptured AAA, and any patient in 
whom the diagnosis is considered should undergo ultra¬ 
sonography or computed tomography. However, there are 
patients whose clinical likelihood of having a ruptured AAA 
lies below the physician’s threshold for obtaining an imaging 
study and for whom physical examination may therefore be 
decisive. Many physicians are unfamiliar with the varied pre¬ 
sentations of ruptured AAAs, so palpation of a widened aorta 
may be the first suggestion of the diagnosis. 17 

The importance of the physical examination in these set¬ 
tings depends largely on its accuracy. In this article, the accu¬ 
racy of physical diagnosis of an AAA is assessed by review and 
analysis of the available literature. In 1905, Osier 18 observed 
that “no pulsation, however forcible, no thrill, however 
intense, no bruit, however loud—singly or together—justify 
[sic] the diagnosis of an aneurysm of the abdominal aorta, 
only the presence of a palpable expansile tumour!’ Accordingly, 
most of the literature on physical examination to detect AAA 
has dealt with abdominal palpation to measure the width of 
the pulsatile mass representing the aneurysmal aorta, but sev¬ 
eral other physical signs have been considered. In one study, 
abdominal and femoral bruits and absent femoral pulses had 
no predictive value. 8 Another study found that location of the 
pulsation more than 3.0 cm caudad of the umbilicus was not 
predictive of AAA. 19 In 1975, Guarino 20 stated that the pulsa¬ 
tile mass of AAA could be distinguished by its being moveable 
laterally but not cephalad or caudad. This observation was 
not studied, however, and in the current era of readily avail¬ 


able ultrasonography, there may be little value in further 
increasing the specificity of physical examination once a wid¬ 
ened aorta is felt. We are aware of no other putative signs of 
AAA for which published information is available, so the 
remainder of this article will be limited to the consideration of 
abdominal palpation in detecting a widened aorta. Attempts 
to measure precisely the AAA diameter by abdominal palpa¬ 
tion (as opposed to simply differentiating abnormal from 
normal) have also been studied 4 ’ 5,21 ' 23 but are of limited impor¬ 
tance now that AAA measurements are routinely obtained 
more accurately from follow-up imaging studies and so will 
not be considered further. 


METHODS 

We searched MEDLINE for articles from 1966 to August 
1998, using a search strategy previously developed for The 
Rational Clinical Examination series that combined 10 
exploded MeSH headings (“physical examination,” “medical 
history taking,” “professional competence,” “sensitivity and 
specificity,” “reproducibility of results,” “observer variation,” 
“diagnostic tests, routine,” “decision support techniques,” 
“Bayes theorem,” “mass screening”) and 2 text word catego¬ 
ries (“physical exam$” and “sensitivity and specificity”), and 
then we took the intersection of this set with aortic aneurysm 
(exploded). The resulting set, plus articles in our files, refer¬ 
ences cited by these articles, and references in textbooks, was 
reviewed for information pertinent to the clinical examina¬ 
tion of AAA. Unpublished information was obtained from 
the authors of some studies. 

Series with fewer than 10 patients and those published 
before 1966 were not considered. No other exclusions (eg, 
language, publication type) were applied. We assigned each 
study to a level of evidence according to a system previously 
developed for this series. 24 Level 1 studies are independent, 
blind comparisons of sign or symptom results with a crite¬ 
rion standard among a large number (sufficient to have nar¬ 
row confidence limits on the resulting sensitivity, specificity, 
or likelihood ratio) of consecutive patients suspected of hav¬ 
ing the target condition. Level 2 studies are independent, 
blind comparisons of sign or symptom results with a crite¬ 
rion standard among a small number of consecutive patients 
suspected of having the target condition. Level 3 studies are 
independent, blind comparisons of signs and symptoms with 
a criterion standard among nonconsecutive patients sus¬ 
pected of having the target condition. Level 4 studies are 
nonindependent comparisons of signs and symptoms with a 
criterion standard among convenience samples of patients 
who obviously have the target condition plus, perhaps, 
healthy individuals. Level 5 studies are nonindependent 
comparisons of signs and symptoms with a standard of 
uncertain validity (which may even incorporate the sign or 
symptom result in its definition) among convenience sam¬ 
ples of patients and, perhaps, healthy patients. 

Abdominal aortic aneurysm, to provide consistency in data 
extraction, was defined as an abdominal aortic diameter of 3.0 
cm or greater. There is no widely accepted method of defining 


CHAPTER 2 Abdominal Aortic Aneurysm 


the cutoff point between a normal aorta and an AAA. Imaging 
studies done in clinical practice are often interpreted according 
to arterial shape (eg, distal widening), but epidemiologic stud¬ 
ies have generally used the simpler measure of unadjusted 
infrarenal aortic diameter, which has been shown to be associ¬ 
ated with rupture risk. 25 An infrarenal aortic diameter of 3.0 
cm is a commonly used but somewhat controversial cutoff 
point in published articles, whereas a diameter of 4.0 cm or 
larger is clearly diagnostic of an AAA. Adjustment of the cutoff 
point for such factors as age, sex, and body size has been sug¬ 
gested but appears to have little practical value. 26 

An a priori decision was made to consider intermediate 
findings on palpation as negative when the uncertainty was 
due to the aorta’s being impalpable 27 ' 30 and positive when the 
findings were considered suggestive of an AAA (as opposed 
to definite). 8,31 

Sensitivity was calculated as the proportion of affected 
patients with positive findings, specificity as the proportion 
of unaffected patients with negative findings, and a positive 
predictive value as the proportion of patients with positive 
findings who were affected. Likelihood ratios were also cal¬ 
culated; the positive likelihood ratio (LR+) is defined as 
sensitivity/(1 - specificity) and expresses the increase in the 
odds of having the disease when the finding is positive 
(LR+ values are >1), and the LR- is defined as (1 - sensitiv- 
ity)/specificity and expresses the decrease in the odds of 
having the disease when the finding is negative (LR- values 
are 0-1). Values for true positives, false positives, true nega¬ 
tives, and false negatives were increased by 0.5 when likeli¬ 
hood ratios were computed to avoid division by 0. 32 CIs for 
likelihood ratios from individual studies were computed 
using the method of Simel et al. 33 

The studies of AAA screening were judged to be of suffi¬ 
cient quality and similarity of design to assess for statistical 
similarity. The tests for heterogeneity of the sensitivity 
data were not significant (all P > .10), supporting the deci¬ 
sion to pool these data. 34 However, assessments of heteroge¬ 
neity of the effectiveness scores (a measure of the effect size 
of a diagnostic test result) were of borderline significance 


(pooled effectiveness, 1.7; P = .04 using a cutoff of 3.0 cm; 
pooled effectiveness, 2.1; P = .06 using a cutoff of 4.0 cm). 32 
Therefore, a random-effects measure was used as a conserva¬ 
tive method for pooling the results of these studies, and CIs 
for the pooled likelihood ratios were calculated by using the 
method of Eddy and Hasselblad. 34 

RESULTS 

Abdominal Palpation for Ruptured AAA 

Several studies have reported the sensitivity of abdominal 
palpation in patients with ruptured AAA (Table 2- 1). 17,35-42 
In these studies, it is not clear how often the physical find¬ 
ings suggested the diagnosis of AAA as opposed to being 
elicited after the diagnosis was made by other methods. 
The sensitivities tended to be higher when patient selec¬ 
tion was limited to those diagnosed antemortem (includ¬ 
ing operative series). Three series included masses that 
were described as not pulsatile, and sensitivities with these 
masses included are reported separately in Table 2-1. 
Compared with asymptomatic AAAs, ruptured AAAs tend 
to be larger, which would be expected to increase sensitiv¬ 
ity, 43 but rupture may also be associated with guarding, 
intestinal distention caused by compromised circulation, 
and loss of integrity of the AAA, which could have the 
opposite effect. 

Abdominal Palpation for Asymptomatic AAA 

Some studies have reported the sensitivity of abdominal pal¬ 
pation in patients with known asymptomatic AAAs (range of 
sensitivities, 65%-100%). 4-7,22,23,36,39,44-49 Most of these studies 
involved patients undergoing preoperative evaluation for 
elective repair of large AAAs, and many patients were origi¬ 
nally identified by physical examination before referral to the 
study group. The lack of blinding and the preponderance of 
large AAAs likely resulted in higher sensitivities than would 
be achieved in most clinical settings. 


Table 2-1 Sensitivity of Abdominal Palpation in Series of Patients With Ruptured Abdominal Aortic Aneurysm 2 


Source, y 

No. of AAAs 

Sensitivity of Palpation (%) b 

Patient Selection 

Pryor, 35 1972 

44 

45 (82) 

All 

Williams et al, 36 1972 

79 

97 

Operated on 

Ottinger, 37 1975 

40 

75 (100) 

Diagnosed antemortem 

McGregor, 38 1976 

41 

44 (51) 

Unoperated on at autopsy 

Gordon-Smith et al, 39 1978 

83 

90 

Operated on 

Gaylis and Kessler, 40 1980 

105 

87 

Diagnosed antemortem 

Donaldson et al, 41 1985 

81 

91 

Not stated 

Walsh et al, 42 1992 

55 

64 

All 

Lederle et al, 17 1994 

23 

52 

Presented to internist 


Abbreviation: AAA, abdominal aortic aneurysm. 

“All studies provide level 4 evidence (see “Methods” section). 

“Numbers in parentheses represent the sensitivity if nonpulsatile masses are included. 
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Other studies have reported the positive predictive value of 
clinical suspicion for AAAs in a series of patients referred for 
imaging studies (range of positive predictive values, 15%- 
91%). 6,13,21,31,48 ' 53 The wide range of values may reflect possible 
inclusion in some studies of patients with previous diagnos¬ 
tic imaging studies before their referral to the study group 
(falsely increasing positive predictive value) and of patients 
referred for ruling out AAA according to indications other 
than palpation of a widened aorta (potentially falsely increas¬ 
ing or decreasing positive predictive value). Two studies pro¬ 
vide results by age and sex, indicating that the highest 
positive predictive values are obtained in men older than 60 
years, with low values (<15%) obtained in women and 
younger men. 13,53 

The best evidence available for assessing the performance 
of abdominal palpation in detecting AAAs comes from series 
of patients not previously suspected of having AAAs who 
were screened by abdominal palpation and ultrasonography 
(Table 2-2). 8,19,27 ' 30,54 ' 62 In all 15 of these studies, screening was 
limited to patients at increased risk for AAAs, usually older 
men with hypertension or vascular disease. Blinding of the 
examiner was ensured when physical examination preceded 
ultrasonography; this was stated to have occurred in 8 of 
these 15 studies 8,19 - 27 ' 30,55,59 and was implied to have occurred in 
the others. No study stated whether the ultrasonographer 
was blinded to the physical examination findings. 

The low level of disease prevalence in these screening stud¬ 
ies and the resulting low expectation of disease by the exam¬ 
iner have the advantage of reflecting most clinical settings. A 
disadvantage is that the small number of AAAs, particularly 
larger AAAs, limits the precision of the estimates from indi¬ 
vidual studies. We addressed this problem by pooling data 
from all studies. 

In the pooled analysis, the sensitivity of abdominal pal¬ 
pation increased significantly with the AAA’s diameter (P 
< .001, x 1 for trend), illustrating the previously described 
effect of disease severity on sensitivity. 43 As seen in Table 2-2, 
the pooled sensitivity values range from 29% for AAAs of 3.0 
to 3.9 cm to 50% for AAAs of 4.0 to 4.9 cm and to 76% for 
AAAs of 5.0 cm or greater. As would be expected, these sensi¬ 
tivities are lower than those observed in the series of previ¬ 
ously known (and presumably larger) AAAs mentioned 
above. 

The high LR+ indicates that the finding of a widened aorta 
greatly increases the odds that an AAA is present, whereas the 
LR- indicates that the absence of this finding is only moder¬ 
ately effective in ruling out an AAA. Not surprisingly, the 
likelihood ratios also indicate that abdominal palpation is a 
more effective diagnostic test for larger AAAs (LR+, 16; LR-, 
0.51, using a cutoff point for AAA of > 4.0 cm vs LR+, 12; 
LR-, 0.72 using a cutoff point for AAA of > 3.0 cm). 

Factors That Affect Abdominal Palpation for AAA 

The sensitivities shown in Table 2-2 apply only to abdominal 
palpation directed at AAA detection and not to routine 
abdominal palpation (abdominal palpation as it is routinely 
done in practice, not necessarily specifically directed at mea¬ 


suring aortic width). Several studies have compared routine 
physical examination with abdominal palpation directed at 
AAA detection. In one of the screening studies listed in Table 
2-2, all 5 patients with AAAs considered definite at the study’s 
physical examination and subsequently confirmed by ultra¬ 
sonography had been missed on routine physical examination 
of the abdomen in the previous 6 months. 8 Another study 
found that 95 of 188 patients with AAAs considered palpable 
on physical examination before surgery had been missed on 
at least 1 physical examination in the 12 months preceding the 
initial diagnosis. 47 In a third study, 19 of 37 patients with pre¬ 
viously undiagnosed but easily palpable ruptured AAAs (all 6- 
10 cm in diameter) had undergone physical examination in 
the previous 24 months, but the diagnosis had been missed. 63 
Abdominal aortic aneurysms enlarge at a mean rate of 0.2 to 
0.5 cm/y, 25,64 so the interval was unlikely to have been an 
important confounder in these studies. 

Obesity also appears to limit the effectiveness of abdomi¬ 
nal palpation. In one study, patients with AAAs missed at 
palpation had significantly greater mean abdominal girth 
than patients with AAAs detected at palpation (111 vs 96 cm; 
P < .01), and when abdominal girth was less than 100 cm, 6 
of 6 AAAs were detected at palpation compared with 3 of 12 
AAAs that were detected when abdominal girth was 100 cm 
or more (P < .01). 8 Another study observed that 23% of the 
patients “were too obese for us to feel the aortic pulse.” 30 We 
are aware of no reports discussing whether AAA is ruled out 
more reliably when the aorta is palpable and considered to be 
normal than when the aorta cannot be adequately palpated. 

How to Perform Abdominal Palpation to Detect AAA 

Abdominal palpation should be conducted while the patient 
is supine, with his or her knees raised while the abdomen 
relaxes. The examiner first feels deeply for the aortic pulsa¬ 
tion, usually found a few centimeters cephalad of the umbili¬ 
cus (the umbilicus marks the level of the aortic bifurcation) 
and slightly to the left of midline. The examiner then posi¬ 
tions both hands on the abdomen with palms down, placing 
an index finger on either side of the pulsating area to confirm 
that it is the aorta (each systole should move the 2 fingers 
apart) and to measure the aortic width. A generous amount 
of abdominal skin should be included between the 2 index 
fingers, and it is often easier, initially, to probe for one side of 
the aorta at a time. 

It is the width, and not the intensity, of the aortic pulsation 
that determines the diagnosis of an AAA; a normal aorta is 
often readily palpable in thin patients or those with loose 
abdominal muscles. The aorta is normally less than 2.5 cm 
(1 in) in diameter, and aortas larger than this (after allowing 
for skin thickness) warrant further investigation, usually 
with ultrasonography. Physical examination to detect AAAs 
is rarely warranted in persons younger than 50 years because 
of the low frequency of disease in this group. 

There are no known risks associated with palpation of the 
abdominal aorta. We found no reports of AAA rupture 
attributed to physical examination, and a textbook author 
observed that he was “unaware of rupture during examina- 


Table 2-2 Abdominal Palpation in Populations Screened for Asymptomatic Abdominal Aortic Aneurysm 1 


No. of AAAs Diagnosed by Ultrasonography and 

Sensitivity of Abdominal Palpation Positive Likelihood Ratios 



>3.0 cm (All) 

3.0-3.9 cm 

4.0-4.9 cm 

>5.0 cm 

Predictive Cutoff Point: AAA >3.0 cm 

Cutoff Point: AAA >4.0 cm 

Range of 




Value of 



Source, y 

Patient 
Age, y 

Women, 

% 

No. 

Screened 

AAA 

Sensitivity, 

% 

AAA 

Sensitivity, 

% 

AAA 

Sensitivity, 

% 

AAA 

Sensitivity, 

% 

Palpation, 

% 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Cabellon et al, 27 

1983 

43-79 

33 

73 

9 b 

22 

NA 

NA 

NA 

c 

NA 


67 

11 (1.6-73) 

0.77(0.54-1.1) 



Oh man et al, 54 1985 

50-88 

0 

50 

3 

0 

2 

0 

1 

0 

0 



12(0.3-528) 

0.88(0.61-1.3) 

25 (0.6-968) 

0.76(0.34-1.7) 

Twomey et al, 55 1986 

>50 

0 

200 

14 

64 

7 

43 

3 

100 

4 

75 

64 

21 (8.7-53) 

0.38(0.19-0.74) 

18(8.9-39) 

0.20 (0.05-0.83) 

Allen et al, 56 1987 

>65 

43 

168 

3 

0 

2 

0 

0 


1 

0 

0 

1.6(0.1-23) 

0.95(0.65-1.4) 

3.3 (0.3-39) 

0.81 (0.36-1.8) 

Allardice et al, 57 1988 

39-90 

25 

100 

15 

33 

10 

0 

3 

100 

2 

100 

100 

59(3.4-1018) 

0.66 (0.46-0.94) 

176(11-2823) 

0.08(0.01-1.2) 

Lederle et al, 8 1988 

60-75 

0 

201 

20 

45 

10 

40 

5 

20 

5 

80 

35 

4.7 (2.5-9.0) 

0.61 (0.41-0.90) 

4.5 (2.2-9.1) 

0.56(0.31-1.0) 

Collin et al, 19 1988 

65-74 

0 

426 

23 d 

35 

NA 

NA 

NA 


NA 


36 

9.9(4.7-21) 

0.67 (0.50-0.90) 



Shapira et al, 58 1990 

31-83 

36 

101 

4 

0 

2 

0 

0 


2 

0 


20 (0.4-890) 

0.90(0.68-1.2) 

33(0.8-1415) 

0.84 (0.50-1.4) 

Andersson et al, 59 

1991 

38-86 

42 

288 

14 

29 

NA 

NA 

NA 


NA 


31 

8.7 (3.2-23) 

0.73(0.52-1.0) 



Spiridonov and 
Omirov, 60 1992 

17-67 

13 

163 

10 

70 

3 

0 

4 

100 

3 

100 

26 

5.1 (2.9-9.1) 

0.37(0.15-0.87) 

7.2(4.6-11) 

0.07 (0-1.0) 

MacSweeney et al, 28 
1993 

NA 

36 

200 

55 

24 

33 

0 

16 

44 

6 

100 

72 

6.4(2.5-16) 

0.79 (0.68-0.92) 

19(7.8-47) 

0.43 (0.26-0.69) 

Karanjia et al, 61 1994 

55-82 

41 

89 

9 

100 

2 

100 

5 

100 

2 

100 

82 

31 (9.0-105) 

0.05 (0-0.77) 

17(6.9-43) 

0.07 (0-0.97) 

Molnar et al, 62 1995 

65-83 

53 

411 

7 

43 

2 

50 

3 

33 

2 

50 

33 

27(9.1-81) 

0.57(0.31-1.0) 

23 (6.9-74) 

0.59 (0.30-1.2) 

al Zahrani et al, 29 

1996 

60-80 

29 

392 

7 

57 

1 

0 

4 

50 

2 

100 

57 

62 (18-208) 

0.44 (0.20-3.0) 

71 (22-231) 

0.36(0.13-0.97) 

Arnell et al, 30 1996 

55-81 

0 

96 

1 

100 

1 

100 

0 


0 


14 

11 (3.7-33) 

0.27 (0.02-3.0) 

6.5 (0.8-52) 

0.54 (0.08-3.8) 

Pooled results 


26 

2955 

194 

39 

75 

29 

44 

50 

29 

76 

43 

12(7.4-19) 

0.72(0.65-0.81) 

16(8.6-28) 

0.51 (0.38-0.67) 


Abbreviations: AAA, abdominal aortic aneurysm; Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative, likelihood ratio; NA, data not available. 

"Includes unpublished information received from authors. All studies used ultrasonography and provide level 2 evidence. The pooled results for numbers are sums and for functions are from a random-effects measure and provide level 1 evidence 
(see “Methods” section). Abdominal aortic aneurysm is defined as at least 3.0 cm by ultrasonography. 

6 No information was given on AAA diameter. 

"Ellipses indicate values cannot be calculated. 

"Abdominal aneurysms less than 3 cm are included. 
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tion of any of several thousand AAAs seen over four 
decades.” 65 

We are aware of no educational studies examining methods 
of learning AAA palpation. In our experience, however, accu¬ 
rate palpation is readily learned through practice and feed¬ 
back. We have found that physicians can become proficient 
after comparing their findings with ultrasonographic mea¬ 
surements in a few patients with AAAs and a few controls. 

Bottom Line 

The only physical examination maneuver of demonstrated 
value for the diagnosis of an AAA is abdominal palpation to 
detect a widened aorta. Palpation of AAA appears to be safe 
and has not been reported to precipitate rupture. 

Positive findings on abdominal palpation greatly increase 
the likelihood that an AAA, particularly a large AAA, is 
present. Even so, the positive predictive value of 43% (Table 
2-2) indicates that less than half of all high-risk patients (and 
fewer low-risk patients, such as most women and young 
men) suspected of having an enlarged aorta on abdominal 
palpation will be found to have an AAA. However, this may 
not be of great concern because ultrasonography provides a 
safe and relatively inexpensive confirmatory test. 

Abdominal palpation will detect most AAAs large enough 
to warrant surgery, but it cannot be relied on to rule out the 
diagnosis. The sensitivity of palpation appears to be reduced 
by abdominal obesity and by routine abdominal examina¬ 
tion not specifically directed at measuring aortic width. 
When a ruptured AAA is suspected, imaging studies such as 
ultrasonography or computed tomography should be per¬ 
formed regardless of physical findings. 
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UPDATE: Abdominal Aortic Aneurysm 



Prepared by Frank Lederle, MD 
Reviewed by Ed Etchells, MD 


CLINICAL SCENARIO 


You are performing a physical examination on an obese 
65-year-old man. You have been thorough with abdomi¬ 
nal palpation and allowed the abdominal muscles to relax 
enough so that you to feel the aortic pulsation. You esti¬ 
mate it to be 2 cm wide, which is normal. Because you 
have heard that abdominal palpation is less accurate in 
obese patients, you wonder whether the examination find¬ 
ings exclude abdominal aortic aneurysm (AAA). 

UPDATED SUMMARY ON 
ABDOMINAL AORTIC ANEURYSM 

Original Review 

Lederle FA, Simel DL. Does this patient have abdominal aor¬ 
tic aneurysm? JAMA. 1999;281(l):77-82. 

UPDATED LITERATURE SEARCH 

We reviewed all citations listed under “exp aortic aneurysm” 
in MEDLINE, from 1998 to July 2004. The search yielded 
7590 titles. We also searched personal files maintained on the 
topic since the original publication. We reviewed titles and 
abstracts to identify new studies that met the original inclu¬ 
sion and exclusion criteria, focusing on large studies that 
included information on the sensitivity or specificity of the 
physical examination for abdominal aneurysms in the gen¬ 
eral population. The review identified only 1 article that met 
our inclusion criteria. 


NEW FINDINGS 

• The interobserver variability for detecting aneurysms is good. 

• The sensitivity of the examination is better for smaller 
patients than for larger patients. However, the sensitiv¬ 
ity in larger patients is still good when the aorta can be 
palpated. 

• When the patient cannot “relax” the abdomen, clinicians 
should be aware that they are more likely to “miss” an 
aneurysm. 


Details of the Update 

Abdominal palpation continues to be an important method 
for diagnosing AAA. In a recent study from a UK district gen¬ 
eral hospital, 48% of all AAAs were diagnosed by physical 
examination 1 compared with 31% in reference 9 of the origi¬ 
nal Rational Clinical Examination article. 

A study published after the original review evaluated patient 
factors such as abdominal obesity, girth, and tightness and the 
effect of a palpable aorta, which might have an effect on the 
accuracy of the clinical evaluation. In addition, the investigators 
provided information on interobserver variability in abdominal 
palpation for AAA. 2 The only pragmatic way to conduct such an 
evaluation is through the evaluation of patients with and with¬ 
out an aneurysm. In this study of 200 subjects, 99 with and 101 
without AAA, the interobserver pair agreement for AAA vs no 
AAA between the first and second examination was 77% (k = 
0.53). The sensitivity of the examination improves with increas¬ 
ing size of the aneurysm. For aneurysms 5 cm or larger, the sen¬ 
sitivity was 82%. Not surprisingly, the examiners also had better 
sensitivity in thinner subjects (abdominal girth less than 100 cm 
[40-in waistline]) than in more obese subjects (sensitivity, 91% 
vs 53% for girth of 100 cm or more). Even when girth was 100 
cm or more, if the aorta was palpable, sensitivity was 82%. Phy¬ 
sicians sometimes have trouble palpating the abdominal aorta 
when patients cannot “relax” their abdomen. This study con¬ 
firmed that the examiners’ assessment that the abdomen was not 
tight improved their accuracy in detecting aneurysms (odds 
ratio, 2.7; 95% confidence interval, 1.2-6.1). 

In another study, 125 subjects with AAA and 39 without 
AAA underwent abdominal palpation with a vascular sur¬ 
geon, a nurse, and the patient. 3 The vascular surgeon and 
nurse knew of the high prevalence of AAA in the sample, but 
they did not know an individual patient’s diagnosis. For vas¬ 
cular surgeons, sensitivity was 57% for AAAs less than 4.0 cm 
but more than 97% for AAAs larger than 4.0 cm. The accu¬ 
racy of nurses and patients was similar to that of the sur¬ 
geons, which is surprising because the patients used palpable 
pulsation as the only criterion for diagnosing AAA. The K 
value for agreement between surgeons and nurses was high, 
at 0.92, and agreement of either with the patient was nearly as 
high. Factors independently associated with false negatives 
were smaller AAA diameter and higher body mass index. The 
extremely high sensitivities, presumably related to the exam- 
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Table 2-3 The More Certain the Examiner Feels About the Findings, 
the More Likely They Are Correct 

Clinical Impression 

LR+ (95% Cl) 

Examination “definite” for aneurysm 

4.8 (2.7-8.8) 

Examination “suggestive” 

1.4(0.92-2.1) 

Examination "normal” 

0.43 (0.35-0.54) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 


iners’ knowledge of the high prevalence of AAA, raise ques¬ 
tions about the study’s generalizability. 

The largest sensitivity study to date was recently reported from 
Brazil. 4 The first 3000 subjects to call in response to an advertis¬ 
ing campaign were scheduled for screening. The study group 
consisted of 2756 subjects who responded to an advertising cam¬ 
paign, were older than 50 years, had no previous diagnosis of 
AAA, and had an adequate ultrasonographic examination. Each 
subject underwent abdominal palpation with a vascular surgeon 
and ultrasonography. It is unclear whether palpation was blinded 
to ultrasonographic findings. There were 64 AAAs 3.0 cm or 
larger identified by ultrasonography. Sensitivity and positive pre¬ 
dictive value of a positive abdominal palpation result were 31% 
and 33%, respectively. This sensitivity was somewhat lower than 
in previous studies, possibly reflecting reduced examiner vigi¬ 
lance resulting from the size of the study. 

Several other studies since the original review added useful 
information but did not meet our inclusion criteria. A pulsa¬ 
tile mass may be present after endovascular repair of AAA, 
potentially leading to diagnostic confusion. 5 A cohort study 
from the Medical Research Council Thrombosis Prevention 
Trial examined the result of abdominal palpation of the aorta 
by general practitioners in 4171 men from 1992 to 1994. 6 
Abdominal aortic aneurysm was suspected in 60 men and 
confirmed in 25 (positive predictive value, 42%). By mid- 
1996, 6 men died of ruptured AAA who had not been sus¬ 
pected of having AAA on palpation, suggesting that sensitiv¬ 
ity of palpation to detect clinically important AAA was less 
than 81%. 

In an older study addressing predictive value, only 1 of 29 
consecutive patients presenting to the Massachusetts General 
Hospital emergency department in the 1970s with tender 
pulsatile mass without hypovolemia actually had AAA. 7 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A new study allows us to assess the likelihood of an aneurysm 
according to clinicians’ confidence in their examination find¬ 
ings and the accuracy of the examination related to various 
patient factors such as obesity (see ble 2-3). 

Whereas the original publication observed that 5 cm was 
the threshold most commonly used for considering surgery, 
2 large randomized trials show no benefit of repair for aneu¬ 
rysms with a diameter of less than 5.5 cm. 8,9 

CHANGES IN THE REFERENCE STANDARD 

There are no changes in the reference standard. 

RESULTS OF LITERATURE REVIEW 

Univariate Findings for AAA 

The efficiency of the examination depends on the confidence 
in your findings. 

EVIDENCE FROM GUIDELINES 

Four trials of screening for abdominal aneurysms with ultra¬ 
sonography have been conducted since the original US 
Preventive Services Task Force and Canadian Task Force 
recommendations. The US Preventive Services Task Force now 
recommends one-time screening for AAA by ultrasonography 
in men aged 65 to 75 years who have ever smoked. 10 


CLINICAL SCENARIO—RESOLUTION 


Although it is true that abdominal palpation is less accu¬ 
rate in obese patients (roughly those with a waist circum¬ 
ference of more than 40 in), the fact that you could 
palpate the aorta improves the accuracy. The sensitivity 
for detecting an AAA 3.0 cm or larger is 82%, and your 
finding that the aorta was normal confers a negative likeli¬ 
hood ratio of 0.30. You are able to reassure the patient 
that, given your examination findings, the likelihood that 
he has an AAA is low. 
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ABDOMINAL AORTIC ANEURYSM— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Abdominal aortic aneurysms occur in 4% to 8% of older 
men. The prevalence in older women is less than 2%. 

POPULATION FOR WHOM AAA 
SHOULD BE CONSIDERED 

• Age older than 50 years 

• History of ever smoking 

• Male sex 

• White race 

• Family history of AAA 

DETECTING AN ABDOMINAL AORTIC ANEURYSM 

The size of an aneurysm affects the clinician’s ability to 
detect it (Table 2-4). 


Table 2-4 Likelihood Ratios Vary With the Size of the Aneurysm 

Ability to Detect an Asymptomatic 
Aneurysm According to Size 

LR+ (95% Cl) 

LR- (95% Cl) 

Aneurysm > 4.0 cm (n = 12 studies) 

16(8.6-29) 

0.51 (0.38-0.67) 

Aneurysm > 3.0 cm (n = 15 studies) 

12(7.4-20) 

0.72(0.65-0.81) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 


Clinicians can detect asymptomatic AAAs. The ability to 
detect the aneurysm relates, in part, to patient characteristics. 
The examination should focus on the width of the palpated 
abdominal aorta. Fortunately, the examination results are just as 
good for the obese as for the nonobese patient when the clini¬ 
cian detects an aneurysm. However, the examination is not as 
efficient at ruling out an aneurysm in obese patients or in those 
who cannot relax their abdomen to facilitate the examination. 

REFERENCE STANDARD TESTS 

Imaging studies (ultrasonography or computed tomography). 
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EVIDENCE TO SUPPORT THE UPDATE: 
Abdominal Aortic Aneurysm 



TITLE The Accuracy of Physical Examination to Detect 
Abdominal Aortic Aneurysm. 

AUTHORS Fink HA, Lederle FA, Roth CS, Bowles CA, 
Nelson DB, Haas MA. 

CITATION Arch Intern Med. 2000;160(6):833-836. 

QUESTION How well do commonly used maneuvers 
work for detecting abdominal aortic aneurysm (AAA)? 

DESIGN Each participant underwent physical examina¬ 
tion of the abdomen by 2 internists. 

SETTING Minneapolis Veterans Affairs Medical Center. 

PATIENTS Two hundred participants (aged 51-88 years), 
99 with and 101 without AAA as determined by previous 
ultrasonography. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The internists were blinded to one another’s findings and to 
the ultrasonographic diagnosis. 

MAIN OUTCOME MEASURES 

K, Mean pair agreement, sensitivity, specificity, likelihood 
ratios, independent predictors of correct diagnosis. The unit 
of analysis was the examination. 

MAIN RESULTS 

Interobserver pair agreement for AAA vs no AAA between 
the first and second examinations was 77% (k = 0.53). Sensi¬ 
tivity increased with AAA diameter, from 61% for AAAs of 
3.0 to 3.9 cm, to 69% for AAAs of 4.0 to 4.9 cm, 72% for 
AAAs of 4.0 cm or larger, and 82% for AAAs of 5.0 cm or 
larger. Sensitivity in subjects with an abdominal girth less 
than 100 cm (40-in waistline) was 91% vs 53% for girth of 
100 cm or greater (P < .001). When girth was 100 cm or 
greater and the aorta was palpable, sensitivity was 82%. 
When girth was less than 100 cm and the AAA was 5.0 cm or 


larger, sensitivity was 100% (12 examinations). Factors inde¬ 
pendently associated with correct examination findings 
included AAA diameter (odds ratio [OR], 1.95 per centime¬ 
ter increase; 95% confidence interval [Cl], 1.1-3.6), abdomi¬ 
nal girth (OR, 0.90 per centimeter increase; 95% Cl, 0.87- 
0.94), and the examiner’s assessment that the abdomen was 
not tight (OR, 2.7; 95% Cl, 1.2-6.7). 

The authors provided us data for each examiner according 
to their degree of confidence in their examination. As 
expected, these data indicate that an examination “sugges¬ 
tive” of aneurysm conveys considerably less certainty than an 
examination “definite” for aneurysm (see ble 2-5). 


Table 2-5 The Efficiency of the Examination Depends on the 
Confidence in Your Findings (n = 3 Examiners) 

Level of Certainty in Findings 

LR+ (95% Cl) 

Examination “definite” for aneurysm 

4.8 (2.7-8.8) 

Examination “suggestive” 

1.4(0.92-2.1) 

Examination “normal” 

0.43 (0.35-0.54) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 


CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS This study was the first to involve sufficient 
numbers of AAA to examine the effect of patient factors such 
as obesity, girth, and abdominal tightness and the effect of a 
palpable aorta. Because previous work indicated that abdom¬ 
inal palpation was insensitive when girth was 100 cm or 
greater, the authors sought to determine whether subgroups 
of patients with large girth could be identified in whom 
abdominal palpation might be reliable. Those with a palpable 
aorta and large girth had sensitivity of 82%. 

LIMITATIONS One likely reason for the increased sensitivi¬ 
ties was increased diagnostic vigilance owing to the high 
prevalence of AAA. 

Unlike previous studies that used consecutive patients with 
relatively low prevalence of AAA, this study included a large 
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number of patients with AAA to provide power to look at the 
value of various patient and examination factors. It was also 
the first study to look at interobserver variability in abdomi¬ 
nal palpation for AAA. The mean pair agreement (77%) and 
K (0.53) for AAA vs no AAA are considered moderate. 
Abdominal palpation has only moderate overall sensitivity 
for detecting AAA but appears to be sensitive for diagnosis of 
AAAs large enough to warrant elective intervention in 
patients who do not have a large girth. Abdominal palpation 
has good sensitivity, even in patients with a large girth, when 
the aorta is palpable. 

Reviewed by Frank A. Lederle, MD 


TITLE Prevalence of Abdominal Aortic Aneurysms: A 
Screening Program in Sao Paulo, Brazil. 

AUTHORS Puech-Leao P, Molnar LJ, Oliveira IR, Cerri 
GG. 

CITATION Sao Paulo Med J. 2004;122(4):158-160. 

QUESTION How accurate is abdominal palpation for 
detecting abdominal aortic aneurysm (AAA)? 

DESIGN Each subject underwent abdominal palpation 
with a vascular surgeon and ultrasonography. 

SETTING University Hospital, Sao Paulo, Brazil. 

PATIENTS The first 3000 subjects to call in response to 
an advertising campaign were scheduled for the screening 
clinic. The study group consisted of 2756 subjects who 
were older than 50 years, without previous diagnosis of 
AAA, and for whom ultrasonography was adequate. 


MAIN RESULTS 


Table 2-6 Results of Palpation in a Large Screening Setting 

Palpation 

N 

No. of AAAs by 
Ultrasonography 

Positive 

60 

20 

Negative 

2398 

41 

Impossible 

298 

3 


Abbreviation: AAA, abdominal aortic aneurysm. 

Sensitivity: 20/64 = 31%. Specificity: 2652/2692 = 98%. Positive predictive value 
of positive examination result: 20/60 = 33%. 


CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS This is by far the largest study of the sensitiv¬ 
ity of palpation to date, comprising nearly as many patients 
as all previous studies combined. The sensitivity of 31% is 
somewhat lower than the pooled sensitivity of 39% reported 
in our original Rational Clinical Examination article, which 
could result from a greater attenuation of any increased 
examiner vigilance resulting from study participation. 

LIMITATIONS It is not clear from the article that examiners 
were blinded to the ultrasonographic results, though the low 
sensitivity would suggest that they were. Although the 
authors have information on age, sex, and AAA diameter, the 
effect of these factors on palpation is not described. 

Reviewed by Frank A. Lederle, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The description of palpation precedes that of ultrasonography 
in the “Methods,” but we are not told explicitly that palpation 
was performed before, or blinded to, ultrasonography. 


MAIN OUTCOME MEASURES 

Palpation result was recorded as positive, negative, or 
impossible. 

AAA was defined as aortic diameter of 3.0 cm or more by 
ultrasonography. See Table 2-6 for the results of palpation for 
this study. 
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CHAPTER 


Is Listening for 

Abdominal Bruits 

Useful in the Evaluation 
of Hypertension? 

Jeffrey M. Turnbull, MD, FRCP 


Toward the end of an unusually busy clinic, a clinical clerk 
greets the final patient of the day, a man with a recently doc¬ 
umented increase of blood pressure. With all the enthusiasm 
that remains after 4 years of medical training, she compul¬ 
sively listens for abdominal bruits. Almost surprised, she 
hears a soft systolic-diastolic epigastric bruit and is faced 
with the inevitable question: so what? 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


As we have gained insight into the origin and meaning of vas¬ 
cular bruits, detailed auscultation of the abdomen has become 
more common. Once detected, an abdominal bruit often is 
characterized according to pitch, timing, amplitude, and loca¬ 
tion in an effort to detect and document pathologic states, 
such as renovascular disease, splenic enlargement, hepatic cir¬ 
rhosis, carcinoma of the pancreas and liver, splenic and hepatic 
vascular abnormalities, intestinal vascular insufficiency, and 
aortic disease. More recently, abdominal bruits have been doc¬ 
umented in a substantial percentage of healthy individuals. 

Although the search for an abdominal bruit has become 
part of the general physical examination, it also has been rec¬ 
ommended as a key element of the examination of the hyper¬ 
tensive patient, in whom the presence of an abdominal bruit 
is considered to be an important feature of renovascular 
hypertension. 1 ' 3 

To be of value, a diagnostic investigation (such as eliciting 
an abdominal bruit in the setting of hypertension) must reli¬ 
ably predict the presence or absence of disease (in this case, 
renovascular hypertension). This process should influence 
the course of management or prognosis. With this in mind, 
the reliability and accuracy of auscultating for an abdominal 
bruit in a patient with hypertension will be examined. 


THE ANATOMIC AND PHYSIOLOGIC 
ORIGIN OF THE ABDOMINAL BRUIT 

Whereas turbulent flow within a vessel is the physiologic basis 
for a bruit, the pitch and radiation are a function of the flow 
and direction of the turbulent stream. Intrinsic or extrinsic 
abnormalities can produce turbulence, and although these 
abnormalities usually arise from within the abdomen, they can 
also arise from the inguinal area, retroperitoneum, or thorax. 


PREVALENCE OF ABDOMINAL BRUITS 

The prevalence of bruits in different groups is summarized in 
Table 3-1. In “normal” populations (individuals without 
hypertension), the presence of any abdominal bruit has been 
detected in 6.5% to 31% of patients, with a predilection for 
the younger age groups (Figure 3-1). Among normal individ- 
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uals older than 55 years, the prevalence was 4.9%. It is gener¬ 
ally believed that the short, faint, and midsystolic bruit heard 
in these asymptomatic patients is “innocent.” 7 

In patients with angiographically proven renal artery ste¬ 
nosis, bruits have been documented in 77.7% to 86.9% of 
cases, with higher prevalence than the 28% observed among 
unselected patients referred for hypertension. 5 ' 8 - 9 In a study 
by Grim et al, 10 the systolic-diastolic bruit was never detected 
in 379 normal subjects and was found in 1 of 199 patients 
with essential hypertension. 

Eppier et al 11 distinguished the presence of abdominal bruits 
in fibromuscular hyperplasia of the renal artery from that in 
atherosclerotic lesions. Their retrospective medical record 
review of 87 patients with surgically treated renal artery stenosis 
revealed a bruit in 77% of patients with fibromuscular disease 
and in 35% of patients with atherosclerotic disease. 


Table 3-1 The Prevalence of Abdominal Bruits 


Reference, y 

Age, y 

No. and Study Group Prevalence, % 

General Population 

Edwards et al, 4 
1970 

17-30 

200 healthy volunteers 

6.5 

Julius and Stew¬ 
art, 5 1967 

Unknown 

170 volunteers 

16 

Rivin, 6 1972 

16-85 

426 patients without car¬ 
diovascular or intra¬ 
abdominal disease 

18 

Watson and Will- 

13-71 

161 psychiatric patients 

31 

iams, 7 1973 

13-78 

200 patients referred with 
gastrointestinal complaints 

27 

Patients With Hypertension 

Julius and Stew¬ 
art, 5 1967 


155 patients referred with 
hypertension 

28 

Patients With Angiographically Proven Renal Stenosis 

Hunt at al, 8 1974 

6-63 

100 patients referred for 
investigation of hypertension 

87 

Perloff et al, 9 1961 

17-72 

54 patients referred with 
sustained hypertension 

78 


□ Julius and Stewart, 5 1967 

□ Rivin, 6 1972 



Figure 3-1 The Prevalence of Bruits Varies 
With Age in Normal Populations 


HOW TO EXAMINE FOR ABDOMINAL BRUITS 

The patient should be relaxed in a supine position, with 
the room quiet and with the examiner initially auscultat¬ 
ing in the epigastrium, with moderate pressure applied to 
the diaphragm of the stethoscope. All 4 quadrants should 
be auscultated anteriorly. The auscultation should con¬ 
tinue over the spine and flanks in the areas between T12 
and L2 to rule out bruits that may be heard best posteri¬ 
orly. However, no data exist that would support the rou¬ 
tine auscultation of the back for abdominal or retroperitoneal 
bruits. Once detected, bruits can be correlated to the car¬ 
diac cycle by palpation of the carotid upstroke, with the 
systolic-diastolic bruit being more prolonged and extend¬ 
ing into diastole. 

Because the kidneys lie retroperitoneally and the renal 
arteries leave the aorta in the area cephalad to the umbili¬ 
cus, attention should be given to auscultation in the epigas¬ 
tric area for the bruit of renovascular disease, a pancreatic 
neoplasm, or an innocent bruit (Figure 3-2). The bruit of a 
hepatic carcinoma has been heard in the right upper quad¬ 
rant, whereas that of a splenic arteriovenous fistula has 
been described in the left upper quadrant. Periumbilical 
bruits are at times heard in the setting of mesenteric 
ischemia, and venous hums are from portosystemic hyper¬ 
tension. Finally, in the older population, an abdominal 
bruit may be associated with an abdominal aortic aneu¬ 
rysm. Estes, 12 in a study of 102 patients with abdominal aor¬ 
tic aneurysms, demonstrated the presence of an associated 
bruit in 28% of cases. 
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THE PRECISION OF ABDOMINAL 
AUSCULTATION FOR BRUITS 

Neither intraobserver nor interobserver variations in the way we 
elicit this sign have been evaluated in detail. However, Watson 
and Williams 7 reported 92% (149/161) agreement when 
patients with celiac artery compression were prospectively 
examined by 2 examiners for the presence of an abdominal 
bruit. With standardization, auscultation of the abdomen can be 
performed with the appropriate degree of precision. 

THE ACCURACY OF ABDOMINAL AUSCULTATION 
IN RENOVASCULAR HYPERTENSION 

This discussion will concentrate on abdominal bruits in fibro- 
muscular and atherosclerotic renovascular disease. Because 
abdominal bruits occur in healthy individuals and in those with 
the nonrenovascular conditions listed in Table 3-2, they may 
occasionally yield false-positive findings in hypertensive patients. 

Many studies describe the accuracy of the abdominal bruit in 
detecting renovascular disease in patients referred for hyperten¬ 
sion, but only 3 demonstrate sufficient methodologic rigor 
(Table 3-3). These reports were of sufficient size and uniform 
clinical assessment, and the angiogram was the criterion stan¬ 
dard. A further study by Julius and Stewart 5 reported a sensitiv¬ 
ity of 20%; however, specificity could not be estimated. 


PRESENCE OF ABDOMINAL BRUITS 

The most useful study 10 of the accuracy of abdominal ausculta¬ 
tion assembled a consecutive series of patients referred to a uni¬ 
versity medical center for hypertension. All patients healthy 
enough for surgery underwent careful abdominal auscultation, 
with positive findings confirmed by a second examiner, plus 
other tests for renovascular hypertension, including arteriogra¬ 
phy. Of 64 patients with renovascular hypertension (an abnor¬ 
mal angiogram result and a renal vein renin ratio >1.5), 25 had 
combined systolic-diastolic abdominal bruits, for a sensitivity of 
39% (95% confidence interval [Cl], 27%-51%). Of 199 hyper¬ 
tensive patients with normal arteriogram results, 2 had systolic- 
diastolic bruits, for a specificity of 99% (95% Cl, 98%-100%). 
Thus, although the absence of a systolic-diastolic bruit did not 
rule out renovascular hypertension, the presence of a systolic- 


diastolic bruit helped to rule it in, with a likelihood ratio (LR) of 
39 (95% Cl, 9.4-160). 

A second study recorded any epigastric or flank bruits in a 
series of hypertensive patients undergoing arteriography. 24 
Not surprising, the sensitivity of 63% (95% Cl, 45%-81%) 
for any bruit was higher than in the previous study, whereas 
the specificity for any bruit was somewhat lower, at 90% 
(95% Cl, 84%-96%). Consequently, the presence of any sys¬ 
tolic bruit confers a lower LR for renovascular hypertension 
(LR = 6.4; 95% Cl, 3.2-13). Thus, the systolic-diastolic 
abdominal bruit is less sensitive (P = .04; X 2 \ = 4.36) and 
more specific (P < .01; x 2 i = 13.5) than the combination of 
both isolated systolic and combined systolic-diastolic bruits. 

Other than these studies and that by Perloff et al, 9 addi¬ 
tional studies of the accuracy of abdominal bruits in patients 
with hypertension are less rigorous and are not reported. 

In summary, there is a substantial prevalence of systolic 
bruits in young, healthy patients, which increases in hyperten¬ 
sive patients, especially those with documented renovascular 
disease. In instances when the accuracy of the abdominal bruit 
has been rigorously assessed in evaluating patients with reno¬ 
vascular disease, the sensitivity has been reported to be 
between 20% and 78%, whereas the specificity has been 
between 64% and 90%. Systolic-diastolic bruits are seldom 


Table 3-2 Reported Nonrenovascular Causes of an Abdominal Bruit 3 

Reference, y 

Condition 

Arida, 13 1977 

Splenic arteriovenous fistula 

Bloom, 14 1950 

Hepatic cirrhosis 

Clain et al, 15 1966 

Alcoholic hepatitis, hepatoma 

Estes, 12 1950 

Abdominal aortic aneurysm 

Goldstein, 16 1968 

Celiac artery compression syndrome 

Lee, 17 1967 

Bacterial gastroenteritis 

Matz and Spear, 18 1969 

Unilateral renal hypertrophy 

McLoughlin et al, 19 1975 

Celiac artery stenosis 

Sarr et al, 20 1980 

Chronic intestinal ischemic 

Serebro and W'srand, 21 1965 

Pancreatic neoplasia 

Shumaker and Waldhausen, 22 1961 

Hepatic arteriovenous fistula 

Smythe and Gibson, 23 1963 

Tortuous splenic arteries 


“No data exist that would permit the listing of these disorders by prevalence. 


Table 3-3 Accuracy of the Abdominal Bruit in Renovascular Hypertension 








LR 


Reference, y 

Type of Bruit 

Sensitivity, % (95% Cl 3 ) 

Specificity, % 

If Present 

If Absent 

Grim et al, 10 1979 

Systolic and diastolic abdomi¬ 
nal bruit 

25/64 = 39 (27-51) 

197/199 = 99(98-100) 

39 

0.6 

Fenton et al, 24 1966 

Any epigastric or flank bruit, 
including isolated systolic bruit 

17/27 = 63(45-81) 

82/91 = 90 (84-96) 

6.4 

0.4 

Perloff et al, 9 1961 

Systolic bruit 

78 

64 

2.1 

0.35 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“Cl obtained with the use of normal approximation method. 
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heard in healthy people or in patients with essential hyperten¬ 
sion, but they are more common in individuals with renovas¬ 
cular disease. In patients with fibromuscular disease, there is 
an increased prevalence for all types of bruits. 


AUSCULTATORY CHARACTERISTICS OF BRUITS 

Although many bruits have been characteristically described 
as having a certain pitch, intensity, and location, the data to 
support this have been questioned. 11,19 Moser and Caldwell 25 
demonstrated a slightly increased prevalence of high-pitched 
bruits in association with renal artery disease (87%) when 
compared with the prevalence of medium-pitched or low- 
pitched bruits (57%). This finding supports the results of 
Julius and Stewart, 5 who reported an increased prevalence 
(64%) of high-pitched bruits in these patients. 

In the study by Moser and Caldwell, 25 the intensity of the 
bruit described in patients with renovascular disease was less 
discriminatory, with 80% (17/21) of cases having loud bruits 
and 55% (16/29) having quiet bruits. These same authors 
described their results in predicting the localization of the 
stenosis. In their study, of the 13 patients in whom renovas¬ 
cular disease was isolated to 1 vessel, stenosis was correctly 
localized beforehand in 6 (46%). Eppier et al 11 reported 
slightly better results because the site of the renovascular 
lesion was correctly localized in 70% of patients with fibro¬ 
muscular disease and 43% of patients with atherosclerotic 
renovascular disease. Julius and Stewart 5 directly auscultated 
the renal artery by using a sterile stethoscope at the time of 
renovascular surgery, demonstrating that, of 18 patients with 
bruits, in 9 the bruits were confined to the correct renal 
artery and in 7 the renal artery bruits were combined with 
additional vascular bruits. In 2 patients (11%), the bruits 
heard before surgery were secondary to other vascular 
abnormalities, and there were no bruits associated with the 
renal artery. 


PROGNOSIS OF PATIENTS WITH 
HYPERTENSION AND BRUITS 

Finally, the importance of identifying the location, pitch, 
and intensity of a bruit is questionable, and this issue awaits 
further clarification with larger prospective studies. Two 
reports have linked the presence of bruits to the outcome of 
renovascular surgery but with conflicting results. Eppier et 
al 11 found that 84% of patients with systolic-diastolic bruits 
had favorable surgical results, compared with 55% of 
patients with only systolic bruits or no bruits. This result was 
replicated in patients whose renal artery stenoses were due to 
atherosclerosis, but the presence of diastolic bruits and the 
recent onset of hypertension correlated with favorable surgi¬ 
cal outcomes in patients with both fibromuscular and ath¬ 
erosclerotic vascular disease. In contrast, Simon et al 26 were 
unable to attach prognostic importance to abdominal bruits 
in patients with fibromuscular or atherosclerotic renovascu¬ 
lar disease. 


THE BOTTOM LINE 

In view of the high prevalence (7%-31%) of innocent 
abdominal bruits in the younger age groups, if a systolic 
abdominal bruit is detected in a young, normotensive, 
asymptomatic individual, no further investigations are war¬ 
ranted. In view of the low sensitivity, the absence of a sys¬ 
tolic bruit is not sufficient to rule out the diagnosis of 
renovascular hypertension. In view of the high specificity, 
the presence of a systolic bruit (in particular a systolic-dia¬ 
stolic bruit) in a hypertensive patient is suggestive of reno¬ 
vascular hypertension. Subsequent investigation should 
take into consideration the pretest likelihood of renovascu¬ 
lar disease and full cost and potential benefits of any man¬ 
agement decision. In view of the lack of evidence to support 
characterizing bruits as to pitch, intensity, and location, 
bruits should be reported only as systolic or systolic/dia¬ 
stolic. Existing information does not permit a definitive 
statement pertaining to the prognostic implication of a 
renal bruit. 

In summary, the critical review of the literature pertain¬ 
ing to the abdominal bruit would suggest that the routine 
auscultation of the abdomen for the presence or absence of 
an abdominal bruit in the healthy asymptomatic popula¬ 
tion is of little value in view of the high prevalence of 
benign bruits. However, for our troubled clinical clerk, the 
presence of a systolic-diastolic bruit would provide sup¬ 
portive evidence of an underlying diagnosis of renovascular 
disease and should lead her to more aggressive investigation 
for this disorder. 
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UPDATE: Abdominal Bruits 



Prepared by David L. Simel, MD, MHS 
Reviewed by Lori Orlando, MD 


CLINICAL SCENARIO 


A 55-year-old, white, male smoker has had hypertension for 
10 years. It has always been well controlled, with systolic 
measures of lower than 35 mm Hg. He is receiving a diuretic 
and a (3-blocker. Recently, the systolic pressure has typicaEy 
been 140 to 150 mm Hg. He is a bit overweight (body mass 
index, 26.5). There has been no evidence for atherosclerotic 
disease. His serum creatinine level is unchanged, at 0.11 
pmol/L. The serum cholesterol level is 5.95 mmol/L. Your 
suspicion is that the increased blood pressure is a manifesta¬ 
tion of essential hypertension, but you decide to auscultate 
for an abdominal bruit. You hear none. You would like to 
add an angiotensin-converting enzyme inhibitor, but you 
wonder whether you have ruled out renal artery stenosis as a 
cause of the recent upward trend in his pressure. 

UPDATED SUMMARY ON ABDOMINAL BRUITS 

Original Review 

Turnbull JM. Is listening for abdominal bruits useful in the 
evaluation of hypertension? JAMA. 1995;274( 16): 1299-1301. 

UPDATED LITERATURE SEARCH 

Our literature search crossed the text words “renal artery,” “aus¬ 
cultation,” “bruit,” and “hypertension,” published in English 
from 1994 to 2004. We also searched on the subject heading 
“renal artery obstruction/di.” The search yielded 86 articles for 
which the titles and abstracts were reviewed. One article that 
included sensitivity and specificity data on the abdominal bruit 
as a sign for renal artery stenosis was retrieved. 

NEW FINDINGS 

• A large study of patients with hypertension that is difficult 
to control confirmed the usefulness of finding an abdomi¬ 
nal bruit, even those heard only during systole. 

• Available data do not allow us to make conclusions about 
the prevalence or importance of finding an abdominal 
bruit in black patients. 


Details of the Update 

Many normal individuals have abdominal bruits. The pres¬ 
ence of an abdominal bruit becomes potentially important 
in hypertensive patients, especially those with certain char¬ 
acteristics. Abdominal bruits may be the harbinger of renal 
artery stenosis, and the diagnosis should be suspected in 
hypertensive patients who had their disease onset at a 
young age or who have blood pressures that are seemingly 
resistant to medical treatment. It may be therapeutically 
useful to identify patients with renal artery stenosis 
because balloon angioplasty may be a useful treatment 
intervention for controlling blood pressure, especially 
when medications fail. 1 

One study, identified in the original Rational Clinical 
Examination article, found the highest diagnostic utility 
for an abdominal bruit that had both a systolic and dia¬ 
stolic component. The effect of an abdominal bruit with 
both components compared with an abdominal bruit with 
only a single systolic component has not been evaluated. In 
our updated literature review, we found 1 large, prospec¬ 
tive cohort study of patients with hypertension that is diffi¬ 
cult to control who were systematically evaluated for renal 
artery stenosis. The importance of a systolic bruit in this 
population of patients (predominantly white) was similar 
to that found in previous work that we reviewed in the 
original publication. 2 

A study of 85 consecutive patients with hypertension, dia¬ 
betes, and normal renal function provides useful information 
about ethnicity and renal artery stenosis as it includes a 
higher proportion of black patients than previous studies. 3 
The odds ratio for Afro-Caribbean patients vs other patients 
(white or Asian) was 0.70 (95% confidence interval [Cl], 
0.19-2.5). We can combine the data with those from Krijnen 
et al 2 to find a summary odds ratio of 0.37 (95% Cl, 0.12-1.1) 
for black ethnicity, suggesting that perhaps black patients are 
less likely than other patients to get renal artery stenosis. 
However, the broad CIs suggest that the currently available 
data do not allow us to conclude this with certainty. Unfortu¬ 
nately, data were not provided on the frequency of abdominal 
bruits, so we do not know whether the finding of an abdomi¬ 
nal bruit in black patients has the same significance as in 
other patients. 



CHAPTER 3 Update 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

CIs were not provided in the original publication. A typo¬ 
graphic error in the negative likelihood ratio (LR-) for a bruit 
was found for Table 3-3. The LR- for the study by Perloff et al 4 
should have been 0.35, as is now shown. We reconfigured 
Table 3-3 from the original publication, providing the CIs and 
summary estimates for the presence of a bruit ( ). 

CHANGES IN THE REFERENCE STANDARD 

The reference standard remains arteriography. However, 
noninvasive tests have replaced arteriography in offering a 
less risky screening approach for appropriate patients. 5 At 
possible treatment (ie, balloon angioplasty), all patients 
undergo arteriography to ensure proper technique. 

RESULTS OF LITERATURE REVIEW 

Multivariate Findings for Renal Artery Stenosis 

A clinical prediction model can be used in white patients 
with hypertension that is difficult to control. 2 The model can 
be downloaded to a computer (the DRASTIC [Dutch Renal 
Artery Stenosis Intervention Cooperative] spreadsheet; http:// 
www2.eur.nl/fgg/mgz/software.html, accessed May 16, 2008). 
The model has not been validated prospectively or in a popu¬ 
lation of blacks. 


Table 3-4 Univariate Findings for Renal Artery Stenosis 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Bruit 

Systolic and diastolic 

Grim et al 6 

39 (10-145) 

0.62 (0.49-0.73) 

Systolic with/without diastolic component 3 

Krijnen et al 2 

6.7(3.7-12) 

0.76 (0.66-0.84) 

Fenton et al 7 

6.4(3.2-12) 

0.41 (0.24-0.62) 

Perloff et al 4 

2.2(1.5-3.2) 

0.35 (0.20-0.57) 

Summary systolic bruit 

4.3 (2.3-8.0) 

0.52 (0.34-0.78) 

History of atherosclerotic disease 2 

2.2(1.8-2.8) 

0.52 (0.40-0.66) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likeli¬ 
hood ratio. 

“Did not distinguish between individuals with systolic-only bruits vs systolic and diastolic. 


EVIDENCE FROM GUIDELINES 

The Joint National Committee on Prevention, Detection, 
Evaluation, and Treatment of High Blood Pressure (JNC 7) 
suggests that physicians auscultate for abdominal bruits in 
patients with hypertension. 8 The suggestion is not accompa¬ 
nied with data but is an expert’s opinion. The report specifi¬ 
cally recommends considering renal artery stenosis for 
certain hypertensive patients. 


CLINICAL SCENARIO—RESOLUTION 


Patients with hypertension frequently need treatment 
with additional medications as they get older. The patient 
has none of the more obvious findings to suggest renovas¬ 
cular hypertension from renal artery stenosis. According 
to expert recommendations, you listened for abdominal 
bruits and heard none. The proper technique must be 
used, and you must be listening in a quiet room. Often, 
physicians do not apply enough pressure with the dia¬ 
phragm of the stethoscope. Had you heard a bruit, you 
would have attempted to see whether the bruit extends 
into diastole. This can be done by palpating the carotid 
while listening to see whether the bruit prolongs beyond 
the carotid upstroke. 

The LR data for the presence or absence of systolic 
bruits apply only to patients with resistant hypertension. 
With just 2 medications, you should not assume that he 
has resistant hypertension. Thus, the LR for the absence 
of bruit cannot be applied to this patient. You might 
resort to a clinical decision model (referenced above). 
Given his age, smoking status, sex, body weight, absence 
of a bruit, long history of hypertension, and normal cre¬ 
atinine and cholesterol levels, you would find that his 
predicted probability of renovascular stenosis is 10%. 
Two caveats apply to this model—it was also developed 
with data from patients with resistant hypertension, so 
his probability of renal artery stenosis is probably even 
lower. Second, had your patient been black, you would 
have needed to recognize that the accuracy of the model 
would be unknown. 





















CHAPTER 3 Abdominal Bruits 


RENAL ARTERY STENOSIS— MAKE THE DIAGNOSIS 


Patients without hypertension should not have ausculta¬ 
tion for asymptomatic renal artery bruits because bruits 
frequently are a normal finding. The search for renal 
artery stenosis should be confined to certain patient pop¬ 
ulations (see below). When present in these populations, 
an abdominal bruit is the most useful physical examina¬ 
tion finding for assessment of renal artery stenosis. 

PRIOR PROBABILITY OF RENOVASCULAR DISEASE 

Approximately 1% to 5% of the general population has 
renovascular disease. Approximately 20% of white patients 
with medically refractory hypertension have renal artery 
stenosis. 


DETECTING THE LIKELIHOOD OF RENAL ARTERY STENOSIS 
IN PATIENTS WITH REFRACTORY HYPERTENSION 

See Table 3-5. 


Table 3-5 Clinical Examination Findings for Renal Artery Stenosis 

Finding (No. of Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

Systolic-diastolic bruit (n = 1) 

39 (10-145) 

0.62 (0.49-0.73) 

Systolic bruit (n = 3) 

4.3 (2.3-8.0) 

0.52 (0.34-0.78) 

History of atherosclerotic disease (n = 1) 

2.2(1.8-2.8) 

0.52 (0.40-0.66) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likeli¬ 
hood ratio. 


REFERENCE STANDARD TESTS 


POPULATION FOR WHOM RENAL ARTERY 
STENOSIS SHOULD BE CONSIDERED 

• Onset of hypertension before 30 years of age 

• Patients with an arterial bruit and hypertension, espe¬ 
cially if there is a diastolic component 

• Accelerated hypertension 

• Hypertension that becomes resistant to medication 

• Flash pulmonary edema 

• Renal failure, especially in the absence of proteinuria 
or an abnormal urine sediment result 

• Acute renal failure precipitated by angiotensin-con- 
verting enzyme inhibitors or angiotensin-receptor 
blockers 


Moderate-risk and high-risk patients are subjected to a nonin- 
vasive screening test (ultrasonography, computed tomography, 
magnetic resonance imaging). The type of imaging modality for 
screening (eg, contrast-enhanced ultrasonography vs gadolin¬ 
ium-enhanced computed tomography or magnetic resonance 
angiography) may be operator dependent, and physicians will 
need to rely on their local radiologists’ expertise. All patients 
have their disease status confirmed with arteriography as part of 
a therapeutic procedure. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Abdominal Bruits 



TITLE A Clinical Prediction Rule for Renal Artery Stenosis. 

AUTHORS Krijnen P, van Jaarsveld BC, Steyerberg EW, 
Man in’t Veld AJ, Schalekamp MA, Habbema JD. 

CITATION Ann Intern Med. 1998;129(9):705-711. 

QUESTION Do clinical data identify patients likely to 
have renal artery stenosis? 

DESIGN Prospective data collected as part of a cohort study. 

SETTING Multiple internal medicine departments in 
the Netherlands. 

PATIENTS One thousand one hundred thirty-three 
patients, aged 18 to 75 years, with normal serum creatinine 
levels and referred for hypertension evaluations. Most 
patients had hypertension that was difficult to control. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients were assigned to 1 of 2 treatment protocols. Those who 
had a mean diastolic blood pressure of 95 mm Hg or higher at 
follow-up, or those who experienced an increase in serum cre¬ 
atinine level when treated with angiotensin-converting enzyme 
inhibitor, underwent digital subtraction angiography, and 
underwent other noninvasive tests of the renal arteries. 

The clinical data were collected prospectively. The presence 
of “abdominal bruit” was recorded before the reference stan¬ 
dard tests. 

MAIN OUTCOME MEASURES 

Renal artery stenosis (>50%) identified by arteriography. 

MAIN RESULTS 

From a population of 1133 patients, 477 required renal artery 
stenosis evaluation for either blood pressure that is difficult 
to control or an increase in serum creatinine level when 
treated with an angiotensin-converting enzyme inhibitor. 
One hundred seven patients had renal artery stenosis (22%). 


Table 3-6 Likelihood Ratio of Findings for Renal Artery Stenosis 


Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Abdominal bruit 

0.27 

0.96 

6.7(3.7-12) 

0.76 (0.66-0.84) 

Atherosclerotic 

disease 

0.63 

0.72 

2.2 (1.8-2.8) 

0.52 (0.40-0.66) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Abdominal bruit or atherosclerotic disease (femoral or 
carotid bruit, angina, claudication, myocardial infarction, 
cerebrovascular accident, or vascular surgery) were the vari¬ 
ables with the best accuracy ( "able 3-6). A clinical prediction 
model included the additional terms of age, smoking history, 
recent onset of hypertension, obesity, hypercholesterolemia, 
and the serum creatinine level. The model can be down¬ 
loaded via the Internet (the DRASTIC [Dutch Renal Artery 
Stenosis Intervention Cooperative] spreadsheet; http://www2. 
eur.nl/fgg/mgz/software.html, accessed May 16, 2008). The 
model had an area under the receiver operating characteristic 
curve (a measure of accuracy) of 0.84 (95% confidence inter¬ 
val, 0.79-0.89). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Prospective data collection in the relevant 
population of patients with hypertension that is difficult 
to control. The prediction model was subjected to internal 
validation. 

LIMITATIONS “Abdominal bruit” is not defined. The study 
population had almost no patients of black ethnicity. The 
prediction rule was not externally validated in a separate 
population of patients. 

This is a large study in the population of patients for 
whom renovascular hypertension and renal artery stenosis 
might be considered. The presence of any abdominal bruit 
was recorded by examiners and showed excellent specificity 
with a sufficiently high positive likelihood ratio. A patient’s 
history that indicates previous atherosclerotic vascular dis¬ 
ease also has diagnostic utility. 


E3-1 








CHAPTER 3 Evidence to Support the Update 


A problem for some clinicians is that the patients were 
almost all whites. 1 Given the low prevalence of renovascular 
hypertension in blacks, US physicians cannot be certain that 
the results will generalize well. 


REFERENCE FOR THE EVIDENCE 

1. Wilcox C. Screening for renal artery stenosis: are scans more accurate 
than clinical criteria? Ann Intern Med. 1998;129(9):738-740. 
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CLINICAL SCENARIO 


CHAPTER 


Does This Patient Have an 

Alcohol Problem? 

James M. Kitchens, MD, FRCPC 


A 58-year-old man was admitted to the hospital for an 
elective cholecystectomy. At the time of admission, he 
smelled of alcohol, although he was not obviously intoxi¬ 
cated. On questioning, he said that he had come from a 
business lunch where he had “a drink.” When questioned 
about his alcohol history, he became angry and defensive. 
He said that he was “offended by the implications of these 
questions.” On the day after the surgery, he was found to 
be diaphoretic, tremulous, and hallucinating and was 
judged to be in alcohol withdrawal. Could other inter¬ 
viewing techniques have identified this man as one who 
was alcohol dependent and at risk of withdrawal? 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


It is estimated that more than 100 million Americans drink 
alcohol and that about 10% of those who drink have alcohol 
problems that adversely affect their lives and the lives of their 
families. 1 Alcohol is involved in 10% of all deaths in the 
United States. The mortality rate in those who drink 6 or 
more drinks per day is 50% higher than the rate in matched 
controls. 2 Alcohol is a major factor in suicides, homicides, 
violent crimes, and fatal motor vehicle crashes. Alcohol 
abuse and dependence are common in both partners where 
spouse and child abuse occur. 3 - 4 There is a 4-fold increased 
risk of alcohol dependence in the children of alcohol-depen¬ 
dent parents. 5 

Alcohol is primarily or secondarily implicated in a large 
number of medical problems such as cirrhosis, alcoholic 
hepatitis, portal hypertension, gastritis, nutritional deficien¬ 
cies, cardiomyopathy, dysrhythmias, cognitive dysfunction, 
seizures, neuropathies, myopathies, low birth weight, fetal 
alcohol syndrome, and a variety of head and neck cancers. 1 

Alcohol abuse and alcohol dependence are common prob¬ 
lems. A history of alcohol abuse has been found in one-fifth 
to one-third of patients attending inner-city ambulatory 
medical clinics, and one-third of these patients report an 
active drinking problem. In some of these settings, the preva¬ 
lence of abuse has been as high as two-thirds in men. 6 ' 8 
Unfortunately, physicians recognize only about half of the 
problem drinkers that they encounter, and they are even less 
likely to identify problems in women and elderly people. 813 

DIAGNOSTIC STANDARDS FOR 
ALCOHOL ABUSE AND DEPENDENCY 

Alcohol-related problems provide many diagnostic problems 
for clinicians. In our society, drinking is a common and 
socially complex behavior. At one end of the drinking spec- 
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CHAPTER 4 The Rational Clinical Examination 


Table 4-1 Diagnostic and Statistical Manual of Mental Disorders, 

Revised Third Edition (DSM-lll-R) and International Statistical 

Classification of Diseases, 10th Revision (ICD-10) Diagnostic Criteria 

for Substance Abuse, Harmful Use, and Substance Dependence 

DSM-lll-R Dependence (3 Items Required) 

1. Substance often taken in larger amounts or during a longer period than 
the person intended 

2. Persistent desire or 1 or more unsuccessful efforts to cut down or control 
substance use 

3. A great deal of time spent in activities necessary to get substance, taking 
substance, or recovering from its effects 

4. (a) Recurrent use when substance use is physically hazardous (eg, drives 
while intoxicated) or (b) frequent intoxication or withdrawal symptoms 
when expected to perform major role obligations at work, school, or home 

5. Important social, occupational, or recreational activities given up or 
reduced because of substance use 

6. Continual substance use despite knowledge of having persistent or recur¬ 
rent social, psychological, or physical problem that is caused or exacer¬ 
bated by the use of substance 

7. Marked tolerance: need for markedly increased amounts of substance (at 
least a 50% increase) to achieve intoxication or desired effect or markedly 
diminished effect with continued use of the same amount 

8. Characteristic withdrawal symptoms 

9. Substance often taken to relieve or avoid withdrawal symptoms. 

DSM-lll-R Abuse 

1. Continued use despite knowledge of having persistent or recurrent social, 
occupational, psychological, or physical problem that is caused or exacer¬ 
bated by the use of substance 

2. Recurrent use in situations in which use is physically hazardous 

ICD-10 Dependence (3 Items Required) 

1. A strong desire or sense of compulsion to use a substance 

2. Evidence of impaired capacity to control the use of a substance. This may 
relate to difficulties in avoiding initial use, difficulties in terminating use, or 
problems controlling levels of use. 

3. A withdrawal state or use of the substance to relieve or avoid withdrawal 
symptoms and subjective awareness of the effectiveness of such behavior 

4. Evidence of tolerance of the effects of the substance 

5. Progressive neglect of alternative pleasures, behaviors, or interests in 
favor of substance use 

6. Persisting with substance use despite clear evidence of harmful conse¬ 
quences 

ICD-10 Harmful Use 

1. Clear evidence that the use of a substance was responsible for causing 
actual psychological or physical harm to the user 


trum, alcohol is used in moderation without adverse conse¬ 
quences to the drinkers or those around them. At the other 
end of the spectrum are those drinkers who have adverse 
effects medically, economically, and psychosocially from 
repeated abuse of alcohol. Between those who occasionally 
use alcohol in moderation and those who are frankly alcohol 
dependent lies a continuum of drinkers with varying con¬ 
sumption patterns and risks of alcohol-related problems. 

The rational use of diagnostic tests to identify problem 
drinking or alcohol dependence demands a clear under¬ 
standing of the definitions of the disorder being diagnosed. It 
will also become clear that diagnostic test characteristics, 


such as sensitivity, specificity, and likelihood ratios (LRs), 
vary considerably, depending on the definition of problem 
drinking or alcohol dependence. 

The International Statistical Classification of Diseases, 10th 
Revision ( ICD-10 ) and the Diagnostic and Statistical Manual 
of Mental Disorders, Revised Third Edition ( DSM-III-R ) of the 
American Psychiatric Association present guidelines for the 
diagnosis of substance abuse disorders. 14,15 The ICD-10 recog¬ 
nizes 2 categories: harmful use and alcohol dependence. The 
DSM-III-R recognizes 2 categories: alcohol abuse and alcohol 
dependence. The diagnostic criteria for DSM-III-R and ICD- 
10 are found in Table 4-1. There is another edition of the 
DSM, the DSM-IV. It is not significantly different from DSM- 
III-R with regard to the diagnosis of alcohol abuse and alco¬ 
hol dependence. The following discussion refers to DSM-III-R 
because it has been used as a diagnostic standard for compar¬ 
ison with other diagnostic questionnaires. 

Alcohol dependence represents a syndrome as diagnosed 
by DSM-III-R and ICD-10. The syndrome criteria of the 2 
systems overlap considerably, but there are differences 
between DSM-III-R and ICD-10. The ICD-10 does not 
include items that address the social or legal consequences of 
dependence, nor does it have criteria that assess dangerous 
use (eg, driving or working while intoxicated). The ICD-10 
criteria are restricted to the medical and psychological con¬ 
sequences of abuse and dependence. Despite these differ¬ 
ences, there is excellent concordance between DSM-III-R 
and ICD-10 in the diagnosis of alcohol dependence. 16 This 
high degree of concordance illustrates the fact that depen¬ 
dence most commonly affects medical, psychological, and 
social aspects of life. Rarely are the consequences restricted 
to one sphere of life. 

The ICD-10 and DSM-III-R have separate categories of 
harmful or abusive drinking that do not meet the criteria for 
dependence. However, there is poor concordance between 
the 2 systems for these diagnostic categories. 16 Because it does 
not include criteria for social/legal consequences of drinking, 
ICD-10 makes fewer diagnoses than DSM-III-R does. For 
example, an individual who repeatedly drives while intoxi¬ 
cated would not be assigned a diagnosis under ICD-10 but 
would be assigned a diagnosis as an alcohol abuser under 
DSM-III-R. 

The DSM-III-R is the most widely used diagnostic frame¬ 
work for alcohol-related disorders, and it has been used as 
the diagnostic standard for comparison of other diagnostic 
questionnaires. 6,7,15 The DSM-III-R criteria for alcohol abuse 
or dependence are structured to detect alcohol problems at 
any time in the life of the patient. This lifetime prevalence of 
alcohol problems may not represent an individual’s current 
drinking status. 8 Most studies that use the DSM-III-R criteria 
as the diagnostic standard for the identification of alcohol 
abuse or dependence also use a published structured inter¬ 
view, such as the Structured Clinical Interview for DSM-III-R 
(SCID), that asks specific interview questions that relate to 
the DSM-III-R diagnostic criteria. 17 

Other studies have used alcohol consumption question¬ 
naires and interviews to define a level of “problem drinking” 
and then examined the diagnostic accuracy of screening 
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questionnaires to separate problem drinkers from nonprob¬ 
lem drinkers. 7,18 ' 21 However, in the following section, it will 
be seen that the sensitivities of screening questionnaires 
decrease as the definition of problem drinking is changed to 
include a greater proportion of at-risk drinkers. 

It is clear that excessive alcohol consumption may be detri¬ 
mental to medical and social health. The dangers associated 
with alcohol consumption represent a continuum of risk that 
makes it difficult to define “safe levels” of alcohol consump¬ 
tion. Some authors contend that ingestion of 4 or more 
drinks per day in men and 2 or more drinks per day in 
women constitute a “hazardous” consumption level that 
increases the risk of alcohol dependence and medical prob- 
lems. 4,22,23 A “drink” is defined as equivalent volume amounts 
that have an ethanol content of 0.6 oz. Twelve ounces of beer, 
5 oz of wine, and 1.5 oz of liquor all contain 0.6 oz of etha¬ 
nol. However, safe levels of consumption vary considerably, 
depending on the clinical or social context of drinking. One 
and one-half drinks per day may constitute at-risk drinking 
for pregnant women and represent a health threat to the 
developing child. 18,24 

The World Health Organization (WHO) has developed a 
questionnaire, the Alcohol Use Disorders Identification Test 
(AUDIT), to identify persons with “hazardous” and “harm¬ 
ful” alcohol consumption who may not be captured by DSM- 
III-R or ICD-10 diagnostic criteria. 25 WHO recognizes the 
following disorders of alcohol use: “Hazardous drinking” is 
use that increases the risk of subsequent psychological or 
medical harm and is judged to be 4 or more drinks per day in 
men and 2 or more drinks per day in women. “Harmful 
drinking” occurs in the person who has psychological or 
medical complications as defined in ICD-10. The WHO clas¬ 
sification system attempts to identify persons who drink 
quantities that will increase their risk of subsequent prob¬ 
lems. This modification is driven by concerns about the cost 
and effectiveness of treating alcohol dependence. 25 A review 
of alcohol treatment programs and their effectiveness is 
beyond the scope of this article, but there is a substantial 
body of evidence that brief, ambulatory interventions tar¬ 
geted to persons with hazardous drinking can decrease levels 
of consumption and, it is hoped, decrease the likelihood of 
subsequent harm and dependence. 26 However, diagnosis 
must precede treatment. It is the diagnosis of alcohol disor¬ 
ders in the context of the medical history that is the subject of 
the remainder of this article. 


DIAGNOSTIC TESTS OF ALCOHOL 
ABUSE AND DEPENDENCY 

Several questionnaires have been developed for the detection 
of alcohol disorders, including the cut down, annoyed by 
criticism, guilty about drinking, eye-opener drinks (CAGE) 
questionnaire, the Michigan Alcoholism Screening Test 
(MAST), and the AUDIT. The most widely used are the 
CAGE questionnaire and the MAST. Of these, the MAST has 
been more thoroughly studied in terms of reliability and 
accuracy. However, the MAST and its shortened versions are 


more complicated than the CAGE questionnaire. The CAGE 
questionnaire is short, easily memorized, and reasonably 
accurate, making it the screening test of choice for busy 
house officers and practitioners. 

CAGE Questionnaire 

In 1968, Ewing 27 developed the CAGE questionnaire for the 
detection of alcoholism. CAGE is mnemonic for these 4 
questions: (1) Have you ever felt you ought to cut down on 
your drinking? (2) Have people annoyed you by criticizing 
your drinking? (3) Have you ever felt bad or guilty about 
your drinking? (4) Have you ever had a drink first thing in 
the morning to steady your nerves or get rid of a hangover 
(eye opener)? 

Some investigators have reasoned that alcohol abusers are 
more likely to give accurate responses to the CAGE questions 
if they are part of a series of questions on lifestyle that 
include drinking, smoking, diet, and exercise habits. 7,28 The 
rationale behind this approach is that it may be less likely to 
trigger defensiveness and denial in people who are alcohol 
dependent. Other studies do not attempt to disguise the 
CAGE questionnaire. No studies that examine differences 
between CAGE interviews and written CAGE questionnaires 
were identified. There are no comparative studies of reliabil¬ 
ity or accuracy for the different modes of administering the 
CAGE questions. It seems reasonable to ask these questions 
in a frank, nonjudgmental manner as part of the medical his¬ 
tory or review of symptoms. 

MAST 

The MAST was originally reported on by Selzer 29 in 1971. 
The MAST consists of 24 yes/no questions, with the “alcohol 
dependent” responses being scored as 1, 2, or 5 points. The 
MAST questions are listed in Table 4-2. The most common 
scoring for the MAST has 0 to 3 points as “non-alcohol 
dependent,” 4 or 5 as “probably alcohol dependent,” and 
greater than 5 as “definitely alcohol dependent.” 

Two modified, shortened versions of the MAST have been 
developed to make it a less time-consuming screening instru¬ 
ment for alcohol dependence. A 10-question version, the 
Brief MAST (BMAST), and a 13-question version, the Short 
MAST (SMAST), are available. 30,31 

AUDIT 

WHO sponsored a collaborative project to develop a screen¬ 
ing test that would be able to detect persons with hazardous 
levels of consumption and those with harmful use and 
dependence. The AUDIT questions are listed in Table 4-3. 
Answers are scored from 0 to 4, for a maximum score of 40 
points, with scores of 8 or more considered diagnostic of an 
alcohol use disorder. 25,32 

Biochemical and Hematologic Tests 

Increases in liver enzyme concentrations (aspartate amino¬ 
transferase, alanine aminotransferase, and y-glutamyltrans- 
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Table 4-2 Michigan Alcoholism Screening Test (MAST) 29 

Points 

Question 

2 

1. Do you feel you are a normal drinker? 3 

2 

2. Have you ever awakened the morning after some drinking the 
night before and found that you could not remember a part of the 
evening before? 

1 

3. Does your spouse or parents ever worry or complain about your 
drinking? 3 

2 

4. Can you stop drinking without a struggle after 1 or 2 drinks? 

1 

5. Do you ever feel bad about your drinking? 3 

2 

6. Do friends or relatives think you are a normal drinker? 3 

2 

7. Are you always able to stop drinking when you want to? 3 

5 

8. Have you ever attended a meeting of Alcoholics Anonymous? 3 

1 

9. Have you gotten into fights when drinking? 

2 

10. Has drinking ever created problems with you and your spouse? 3 

2 

11. Has your spouse or other family member ever gone to anyone for 
help about your drinking? 

2 

12. Have you ever lost friends or girlfriends/boyfriends because of 
your drinking? 

2 

13. Have you ever gotten into trouble at work because of drinking? 3 

2 

14. Have you ever lost a job because of drinking? 

2 

15. Have you ever neglected your obligations, your family, or your 
work for 2 or more days in a row because you were drinking? 3 

1 

16. Do you ever drink before noon? 

2 

17. Have you ever been told you have liver trouble? Cirrhosis? 

2 

18. Have you ever had delirium tremens (DTs), severe shaking, heard 
voices, or seen things that weren’t there after heavy drinking? 

5 

19. Have you ever gone to anyone for help about your drinking? 3 

5 

20. Have you ever been in a hospital because of your drinking? 3 

2 

21. Have you ever been a patient in a psychiatric hospital or on a 
psychiatric ward of a general hospital when drinking was part of 
the problem? 

2 

22. Have you ever been treated at a psychiatric or mental health 
clinic or gone to a doctor, social worker, or clergyman for help 
with an emotional problem in which drinking had played a part? 

2 

23. Have you ever been arrested, even for a few hours, because of 
drunk behavior? 3 

2 

24. Have you ever been arrested for drunk driving or driving after 
drinking? 3 


“Included in the short version of the MAST. 


ferase) and mean corpuscular volume have been investigated 
as biological markers of alcohol abuse. All of these tests are 
insensitive in detecting alcohol abusers. None of these tests, 
alone or in combination, perform as well as the MAST or the 
CAGE questionnaire in detecting alcohol abuse. 19,22,33 ' 35 

RELIABILITY OF THE MAST, CAGE, 

AND AUDIT QUESTIONNAIRES 

Gibbs 36 reviewed the internal consistency (a) reliability coef¬ 
ficient of the MAST reported in 6 studies and found it to vary 
from .83 to .93. The a values in 6 studies of the SMAST or 
BMAST ranged from .75 to .81. Skinner and Sheu 37 reported 
the test-retest reliability of the MAST at .84. Reliability coeffi¬ 


cients of 1.0 represent perfect test precision (perfect interob¬ 
server or intraobserver precision), and values close to 1.0 are 
highly precise. No reports measuring the reliability of the 
CAGE and AUDIT questionnaires were identified. 

ACCURACY OF THE MAST, CAGE, 

AND AUDIT QUESTIONNAIRES 

Determining the test accuracy of all questionnaires for alco¬ 
hol use disorders presents some methodologic problems. The 
questions in the CAGE, MAST, and AUDIT questionnaires 
are embodied within the commonly used reference stan¬ 
dards, DSM-III-R and ICD-10, which may result in inflated 
estimates of test accuracy. The advantage of the CAGE and 
AUDIT questionnaires over the much longer questionnaires 
is their brevity, which would allow them to be used as a 
screening or case-finding tool by busy clinicians. 

The diagnostic accuracy of the MAST and its shorter ver¬ 
sions has been reported, with sensitivities of 71% to 100% 
and specificities of 81% to 96%. 8,19,38 The MAST can be criti¬ 
cized as a screening tool because of its length; it requires 
about 20 minutes to administer, making it less likely to be 
used by a busy clinician. 

In most studies of the diagnostic accuracy of the CAGE 
questionnaire, a positive test result has been defined as 2 or 
more affirmative answers to the questions. The CAGE ques¬ 
tionnaire has been validated in several environments, includ¬ 
ing psychiatric inpatients, medical and orthopedic inpatients, 
and ambulatory medical patients in the United States and 
Great Britain. 6,7,18 ' 21,28 Table 4-4 lists studies in which the diag¬ 
nostic accuracy of the CAGE questionnaire has been reported 
and in which the authors specify the “diagnostic standard” 
used to define the patient’s alcohol status. In all these studies, 
changing the criterion of a positive CAGE test result from a 
score of 2 to 1 results in greater test sensitivity but lower speci¬ 
ficity. In other words, the test will identify more problem 
drinkers, but it will also misclassify more nonproblem patients 
as problem drinkers. Note that as the definition of problem 
drinking is lowered, for example, from 16 to 8 drinks per day 
or from 2 drinks to 1 drink per day in pregnant women, the 
sensitivity of the test decreases and the specificity increases for 
the same CAGE threshold. 

The CAGE questionnaire is reasonably accurate at identify¬ 
ing those individuals who are alcohol dependent or heavy 
drinkers (>8 drinks/d). However, it is not at all sensitive at 
detecting the lower levels of consumption that may be danger¬ 
ous, especially in pregnant women. It has not been tested as a 
tool to detect hazardous or at-risk drinking on the order of 4 
drinks per day. It will be less sensitive in that situation. There is 
no difference in the diagnostic accuracy of the CAGE ques¬ 
tionnaire when used in men or women, and it is equally effec¬ 
tive in elderly people. 6,39 However, there is a marked difference 
in the prevalence of alcohol disease in men and women. The 
prevalence of alcohol dependence in women is about one- 
third that in men. The predictive values for CAGE responses 
reflect the lower prevalence figures for women, with lower pos¬ 
itive predictive values and higher negative predictive values. 6,39 
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The AUDIT is a newly developed tool, and only 1 validation 
study was identified. When a positive test result is considered 
to be a score of 8 or more points, the sensitivity of the AUDIT 
in detecting hazardous or harmful use is 92% and the specific¬ 


ity is 94%. 32 However, as noted above, there are methodologic 
reasons to believe that these estimates are inflated and may not 
be reliably testable. The 10 AUDIT questions were culled from 
a 150-item assessment of alcohol use. The AUDIT has not been 


Table 4-3 Alcohol Use Disorders Identification Test (AUDIT) Questions 2532 

1. How Often Do You Have a Drink Containing Alcohol? 

Never Monthly or less 2 to 4 times a month 2 or 3 times a week 4 or more times a week 

2. How Many Drinks Containing Alcohol Do You Have on a Typical Day When You Are Drinking? 

1 or 2 3 or 4 5 or 6 7 to 9 10 or more 

3. How Often Do You Have 6 or More Drinks on 1 Occasion? 

Never Less than monthly Monthly Weekly Daily or almost daily 

4. How Often During the Last Year Have You Found That You Were Not Able to Stop Drinking Once You Had Started? 

Never Less than monthly Monthly Weekly Daily or almost daily 

5. How Often During the Last Year Have You Failed to Do What Was Expected From You Because of Drinking? 

Never Less than monthly Monthly Weekly Daily or almost daily 

6. How Often During the Last Year Have You Needed a First Drink in the Morning to Get Yourself Going After a Heavy Drinking Session? 

Never Less than monthly Monthly Weekly Daily or almost daily 

7. How Often in the Last Year Have You Had a Feeling of Guilt or Remorse After Drinking? 

Never Less than monthly Monthly Weekly Daily or almost daily 

8. How Often During the Last Year Have Been Unable to Remember What Happened the Night Before Because You Had Been Drinking? 

Never Less than monthly Monthly Weekly Daily or almost daily 

9. Have You or Someone Else Been Injured as a Result of Your Drinking? 

No Yes, but not in the last year Yes, during the last year 

10. Has a Relative or Friend or a Doctor or Other Health Worker Been Concerned About Your Drinking or Suggested You Cut Down? 

No Yes, but not in the last year Yes, during the last year 


Table 4-4 Diagnostic Standards and Diagnostic Accuracy for the CAGE Questionnaire 



Patients 



Positive 



Prevalence of 

Source, y 

Type 

No. 

Diagnostic Standard 

CAGE Result 

Sensitivity, % 

Specificity, % 

Alcohol Disease, % 

Bernadt et al, 19 1982 

Psychiatric inpa¬ 
tients 

385 

Ethyl alcohol intake interview plus 
> 16 drinks/d or medical record 
review diagnosis of alcoholism 

>2 

97 

76 

17 

Buchsbaum et al, 6 

Ambulatory medi- 

821 

DSM-lll-R with SCID 

>2 

73 

91 

36 

1991 

cal patients 



>1 

89 

81 


Bush et al, 7 1987 

Medical and ortho- 

521 

DSM-III With MAST, NIAAA intake 

>2 

75 

96 

2 


pedic inpatients 


questionnaire 

>1 

85 

89 


King, 29 1986 

Ambulatory gen- 

407 

Ethyl alcohol intake interview plus 

>2 

82 

95 

4 


eral patients 


> 8 drinks/d 

>1 

0 

84 


Mayfield et al, 28 1974 

Veterans Affairs 

366 

Multidisciplinary team diagnosis 

>2 

81 

89 

39 


hospital inpatients 



>1 

90 

72 


Sokol et al, 18 1989 

Prenatal clinic 

971 

Periconceptual ethyl alcohol 

>2 

38 

92 

4 




intake interview plus > 2 drinks/d 

>1 

59 

82 


Waterson and 

Prenatal clinic 

893 

Periconceptual ethyl alcohol 

>2 

33 

95 

2 

Murray-Lyon, 21 1989 



intake interview plus > 2 (top row) 
vs > 1 (bottom row) drink/d 

>2 

20 

96 

20 


Abbreviations: CAGE, cut down, annoyed, guilty, eye opener; DSM-III , Diagnostic and Statistical Manual of Mental Disorders, Third Edition] DSM-lll-R, Diagnostic and Statistical 
Manual of Mental Disorders, Revised Third Edition] MAST, Michigan Alcoholism Screening Test; NIAAA, National Institute on Alcohol Abuse and Alcoholism; SCID, Structured Clin¬ 
ical Interview for DSM-lll-R. 
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tested as a discrete group of questions against an accepted 
diagnostic standard. As with the other questionnaires for alco¬ 
hol disorders, items in the AUDIT are represented in the com¬ 
monly used reference standards, DSM-III-R and ICD-10. This 
likely inflates the estimates of reliability coefficients and test 
accuracy. The AUDIT attempts to identify drinkers whose con¬ 
sumption places them at risk of harmful or dependent alcohol 
use before dependence has occurred. Three AUDIT questions 
relate to amounts and frequency of consumption. There is no 
reliable way to test the accuracy of patient responses concern¬ 
ing consumption. If heavy drinkers are defensive about their 
drinking and tend to underreport consumption, the AUDIT 
estimate of hazardous drinking may be conservative. 

PREDICTIVE ACCURACY OF THE CAGE QUESTIONNAIRE 

There are 2 ways for clinicians to calculate predictive value or 
posterior probability of disease. 40,41 The first approach uses 
test sensitivity, specificity, and estimates of disease prevalence 
in Bayes theorem. The second approach multiplies the LR by 
the pretest odds of disease to obtain the posttest odds of dis¬ 
ease. The 2 methods are equivalent when the diagnostic test 


Table 4-5 Likelihood Ratios of CAGE Questions for the Diagnosis of 
Alcohol Abuse and Alcohol Dependence 



Buchsbaum et 
al, 6 1991 

Bush et al, 7 
1987 

Mayfield et al, 28 
1974 

Prevalence of 
alcohol disease 

0.36 

0.20 

0.39 

LR by CAGE 
score 

0 

0.14 

0.18 

0.13 

1 

1.5 

1.4 

0.90 

2 

4.5 

6.8 

1.6 

3 

13 

158 

15 

4 

101 

oo 

oo 

Abbreviations: CAGE, cut down, annoyed, guilty, eye opener; LR, likelihood ratio. 


Table 4-6 Posterior Probability of Alcohol Abuse or Alcohol 
Dependence Calculated With Likelihood Ratios 2 




Posterior Probability 



Prevalence of 

Prevalence of 



Alcohol Disease 

Alcohol Disease 

CAGE Score 

LR 

of 10% 

of 36% 

0 

0.14 

.02 

.07 

1 

1.5 

.14 

.46 

2 

4.5 

.33 

.72 

3 

13 

.59 

.88 

4 

101 

.92 

.98 


Abbreviations: CAGE, cut down, annoyed, guilty, eye opener; LR, likelihood ratio. 
"LRs are based on data from Buchsbaum et al. 6 


used gives dichotomous results. However, if the test results 
are not dichotomous, and most are not, these 2 methods may 
give surprisingly different results. The insistence that a given 
cut point be assigned to continuous data or multiple categor¬ 
ical levels can result in a loss of diagnostic power and even 
erroneous diagnostic conclusions. 

In the introductory article to this series, Sackett 42 intro¬ 
duced the concept of LRs for diagnostic tests with multiple 
levels of response. If you are not familiar with LRs, I encour¬ 
age you to review that article. If one wishes to avoid some of 
the pitfalls that may occur when interpreting the results of 
questionnaires, it is important to be able to interpret the 
results with LRs. Table 4-5 lists 3 studies of the CAGE ques¬ 
tionnaire in which LRs can be calculated. 6,7,28 These studies 
have low LRs for CAGE scores of 0 (0.13-0.18), high LRs for 
CAGE scores of 3 (13-158), and very high LRs for CAGE 
scores of 4 (101 to infinity). 

Table 4-6 shows the posterior probability of alcohol abuse 
or dependence for each CAGE score according to the Buchs¬ 
baum et al 6 data and prevalences of 10% and 36%. Alcohol 
abuse or dependence is unlikely in persons with a score of 0. 
With a score of 3, the diagnosis is likely, and a score of 4 is 
virtually diagnostic of alcohol abuse or dependence in the 
higher-prevalence group. However, more caution needs to be 
exercised when interpreting CAGE scores of 1 or 2. The like¬ 
lihood of alcohol abuse or dependence is increased in per¬ 
sons with scores of 2, but one might want to administer other 
confirmatory tests before the patient is given a diagnosis. A 
score of 1 has an LR of 1.5, and the posttest probability of 
disease is only marginally higher than the pretest probability 
of disease. 


PROBLEMS IN THE IDENTIFICATION OF 
AT-RISK DRINKING IN PREGNANT WOMEN 

Pregnant women who drink 2 or more drinks per day may 
expose the fetus to an increased risk of developmental delay, 
growth retardation, cardiac defects, and craniofacial abnor¬ 
malities. 18,24 Women drinking enough to expose the fetus to a 
teratogenic risk may underreport their consumption. This is 
most pronounced among those women with high MAST 
scores who are drinking heavily. 43,44 It has also been shown 
that the BMAST and CAGE questionnaires are insensitive 
instruments for identifying pregnant drinkers who consume 
2 or more drinks per day. 18,21 Sokol et al 18 modified the CAGE 
questionnaire by substituting for the question on “guilt” to 
one on alcohol tolerance: “How many drinks does it take to 
make you high?” The patient was considered tolerant if it 
took more than 2 drinks to make her feel high. The authors 
claim that this question is not likely to generate defensiveness 
and denial. This modified questionnaire, T-ACE (tolerance, 
annoyed, cut down, eye opener), was administered to 1065 
women attending an inner-city obstetric clinic. The preva¬ 
lence of at-risk drinking in this study was judged to be 4.3%. 
The T-ACE questionnaire was found to be more sensitive 
than the CAGE questionnaire (76% vs 59%) and equivalent 
to the MAST in identifying pregnant women drinking more 
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than 2 drinks per day when the cut point for a positive test 
result was a score of 1 or higher. Unfortunately, 40% of the 
women judged to be at-risk drinkers scored 0 on the CAGE 
questionnaire. Although the T-ACE questionnaire was more 
sensitive, 25% of at-risk drinkers had a score of 0. In this 
setting, the specificities of the T-ACE, CAGE, and MAST 
questionnaires were similar (76%-82%) and the positive pre¬ 
dictive values were 13% to 14%. 

Given the low sensitivity of these tests, a significant por¬ 
tion of pregnant drinkers will go undetected. The low preva¬ 
lence of at-risk drinking in this population and the moderate 
specificity of these tests result in low positive predictive val¬ 
ues. Consequently, these questionnaires cannot be expected 
to reliably identify problem pregnant drinkers. 


THE BOTTOM LINE 

In summary, the CAGE questionnaire can be a useful tool in 
the diagnosis of DSM-III-R -defined alcohol abuse and 
dependence and very heavy drinking (>8 drinks/d). A CAGE 
score of 0 has a good negative predictive value at a lower 
prevalence of disease. Scores of 3 or 4 strongly support the 
diagnosis of alcohol abuse. Elowever, scores of 1 or 2 must be 
interpreted with caution, and one should use the LR 
approach to accurately interpret these intermediate scores. 
The CAGE questionnaire has not been tested as a tool for 
identifying persons who may be engaged in hazardous drink¬ 
ing of lesser amounts of alcohol; for example, 4 drinks per 
day. It is likely that the test will be insensitive in detecting 
these individuals. The AUDIT was recently developed to 
identify these hazardous drinkers. It has not been thoroughly 
tested, but the initial report suggests that it is reasonably 
accurate. Because 7 of the AUDIT questions are almost iden¬ 
tical to questions in the MAST or CAGE, it should be good at 
identifying alcohol abuse and alcohol dependence. The other 
3 AUDIT questions relate to consumption and constitute an 
attempt to identify hazardous drinkers. It may not be possi¬ 
ble to determine the accuracy of these questions in the 
absence of a reliable, socially acceptable diagnostic standard 
for consumption. However, if heavy drinkers are defensive 
about their levels of consumption, the AUDIT may underes¬ 
timate levels of consumption. The CAGE questionnaire is 
short and can be easily memorized. It has been field tested 
and shown to be a useful tool. The busy clinician could use 
the CAGE questionnaire to find unrecognized patients who 
are abusing or dependent on alcohol. The first 3 questions of 
the AUDIT are also easily memorized and can provide an 
estimate of the patient’s typical alcohol consumption. The 
busy clinician could use these questions as a form of targeted 
preventive medicine. Men drinking more than 4 drinks per 
day and women drinking more than 2 drinks per day should 
be counseled about the risks of drinking. 

Identifying pregnant women engaged in at-risk drinking is 
problematic. The prevalence of at-risk drinking among preg¬ 
nant women is low, and the screening questionnaires to identify 
problem drinkers have relatively low sensitivities. Because none 
of these instruments is sufficiently reliable to use for case finding 


in pregnant women, all pregnant women should be counseled 
about the risks of drinking while pregnant. Abstinence from 
alcohol would be the safest option, but women who choose to 
drink while pregnant should be strongly advised to avoid binge 
drinking and to drink fewer than 2 drinks per day. 
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UPDATE: 


CLINICAL SCENARIO 


A 35-year-old woman requests an appointment for a 
gynecologic examination. Your nursing staff gives her the 
usual paperwork and self-administered questionnaires 
while she waits in the examination room. She fills them all 
out and gives a response of no to each question. What 
questions did your patient answer? Do you know how to 
evaluate her questionnaire? Could she be a problem 
drinker? 

UPDATED SUMMARY ON SCREENING 
FOR ALCOHOL PROBLEMS 

Original Review 

Kitchens JM. Does this patient have an alcohol problem? 
JAMA. 1994;272(22): 1782-1787. 

UPDATED LITERATURE SEARCH 

The perceived shortcomings of questionnaires for alcohol use 
disorders, coupled with the high prevalence of problems, 
prompted a worldwide effort to improve detection of alcohol 
use disorders. The US Preventive Services Task Force updated 
their recommendations (2004) according to new evidence 
concerning the effectiveness of screening and brief treatment 
interventions. Our literature search, conducted between 1993 
and July 25,2004, combined the search terms “alcoholism/di” 
and “alcohol drinking/cl, pc, ep” and the textwords “problem 
drinking” with “screening.” The search was limited to “sys¬ 
tematic reviews,” and we used the Ovid MEDLINE database, 
along with the evidence-based medicine databases, to yield 
19 English-language articles. We retained articles that were 
systematic (as opposed to nonsystematic reviews) and that 
focused on primary care (eg, rather than population-based 
samples, emergency or psychiatric care). This resulted in 4 
articles that we obtained for review. We kept 1 article that had 
emergency department data to better assess the issues of 
screening women as opposed to men. We concentrated on 
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the shorter-form questionnaires that would be more applica¬ 
ble for primary care (see Appendix , , , 

and 6 for the forms AUDIT, CAGE, T-ACE, and TWEAK, 
respectively). We also retrieved a recent systematic review 
that was published by the Agency for Elealth Care Policy and 
Research as part of an update to the Guide to Clinical Preven¬ 
tive Services, Third Edition, Periodic Updates (see http:// 
www.ahrq.gov/clinic/uspstf/uspsdrin.htm [accessed May 17, 
2008] for the article that first appeared in Whitlock EP, Polen 
MR, Green CA, Orleans CT, Klein J. Behavioral counseling 
interventions in primary care to reduce risky/harmful alcohol 
use by adults: a summary of the evidence for the US Preven¬ 
tive Services Task Force. Ann Intern Med. 2004;140(7):558- 
569). When necessary, we retrieved references from the sys¬ 
tematic reviews to verify likelihood ratios (LRs) for reported 
instruments. After reviewing the retrieved studies and their 
reference lists, we repeated a literature search using the text- 
words CAGE, AUDIT, TWEAK, and T-ACE to make sure that 
we missed no original primary care studies that would have 
met inclusion criteria. 


NEW FINDINGS 

It is now abundantly clear that choosing to screen for prob¬ 
lem drinking by using any standard approach is overwhelm¬ 
ingly more important than deciding on the screening form! 
However, once clinicians commit to screening for alcohol 
problems, there are advantages and disadvantages to the 
current questionnaires that require understanding (1) what 
disorder you are screening for and (2) your patient popula¬ 
tion. Problem drinking is drinking behavior that has not 
reached the level of abuse or dependence. Studies use vari¬ 
ous descriptors for problem drinking, including the terms 
hazardous, at risk, or harmful drinking. 

The past decade has seen the continued validation of the 
AUDIT questionnaire, the recognition that screening for 
alcohol abuse differs from screening for hazardous or prob¬ 
lem drinking, and the need for different approaches to 
screening according to the patient population. Screening 
women and, possibly, older patients requires different 
approaches than screening adult men. 
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DETAILS OF THE UPDATE 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The results have not changed, but newer information allows 
revised estimates of the sensitivity, specificity, and LRs of 
screening tests for alcohol problems ( "able 4-7). 

CHANGES IN THE REFERENCE STANDARD 

The reference standard for alcohol abuse and dependence 
remains the guidelines in the Diagnostic and Statistical Manual 
of Mental Disorders, Fourth Edition. 1 It is now important for 
clinicians to understand what constitutes a “drink” and the 
newer categories of patients’ drinking problems that have not 
reached the level of abuse or dependence. The definition of a 
drink changes across cultures, restaurants, and homes. A stan¬ 


dard drink in Great Britain contains about 8 g of alcohol, as 
opposed to the standard of 19.75 g in Japan. 2 The US Depart¬ 
ment of Health and Human Services and the US Department 
of Agriculture define a standard drink in alcohol and volume 
content that approximates 12 fl oz of regular beer, 5 fl oz of 
wine, or 1.5 fl oz of 80-proof distilled spirits. 2(p7) 

The National Institute on Alcohol Abuse and Alcoholism 
defines moderate drinking according to the frequency of 
drinking. Moderate male drinkers ingest 14 or fewer drinks/ 
wk; moderate women drinkers, 7 or fewer drinks/wk; and 
adults older than 65 years, 7 or fewer drinks/wk. 3 Men 
younger than 65 years would be considered “at risk” drinkers 
when they drink more than 14 drinks/wk or more than 4 
drinks per occasion. Women have drinking problems at lower 
thresholds: more than 7 drinks/wk or more than 3 drinks per 
occasion defines “at risk” drinking among women. The World 
Health Organization uses slightly different descriptors that 
rely on the consequences of drinking rather than the amount 
and frequency: “hazardous” drinkers are those who are at risk 



Table 4-7 Alcohol Problem Screening Results by Test and Population Profile 




Screening 3 Test 
(n = Number of Studies) 

Sensitivity (95% Cl) 

Specificity (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 


At Risk, Harmful, or Hazardous Drinking 

Adults 

AUDIT-C > 8 (n = 1) 

0.40 

0.97 

12(5.0-30) 

0.62 (0.52-0.74) 


AUDIT > 8 (n = 2) 

0.57-0.59 

0.91-0.96 

6.8(4.7-10) 

0.46 (0.38-0.55) 


CAGE > 2 

(n = 1; all patients > 60 y) 

0.14 

0.97 

4.7 (3.7-6.0) 

0.89 (0.86-0.91) 


CAGE > 2 (n = 2) 

0.49-0.69 

0.75-0.95 

3.4 (1.2-10) 

0.66 (0.54-0.81) 


Pregnant Women" 

TWEAK > 3 (n = 2) 

0.67(0.61-0.73) 

0.92(0.91-0.93) 

8.4 

0.36 


TWEAK > 2 (n = 2) 

0.91 (0.87-0.94) 

0.77 (0.76-0.78) 

4.0 

0.12 


T-ACE > 1 (n = 3) 

0.89(0.81-0.94) 

0.75 (0.70-0.79) 

3.6 

0.15 


CAGE > 2 (n = 3) 

0.48 (0.44-0.53) 

0.93 (0.92-0.93) 

6.9 

0.56 


CAGE > 1 (n = 3) 

0.66 (0.62-0.70) 

0.81 (0.81-0.82) 

3.5 

0.42 


Alcohol Abuse or Dependence 

Adults 

CAGE > 2 (n = 10) 



6.9(4.2-11) 

0.33 (0.25-0.43) 


CAGE > 1 (n = 10) 



3.4 (2.3 to 5.1) 

0.33 (0.25-0.43) 


AUDIT > 8 (n = 2) 

0.66-0.71 

0.85-0.86 

4.6 (3.5-6.1) 

0.37 (0.28-0.49) 


Women 

CAGE > 2 (n = 2) 

0.58 (0.32-0.80) 

0.93 (0.90-0.95) 

8.3 

0.45 


CAGE > 1 (n = 1) 

0.89 (0.82-0.93) 

0.83 (0.79-0.86) 

5.2 

0.13 


> 60 y 

CAGE > 2 (n = 3) 

0.13-0.82 

0.82-0.99 

5.2 (3.0-9.0) 

0.37 (0.29-0.47) 


CAGE > 1 (n = 2) 

0.79-0.98 

0.56-0.88 

2.6(1.5-4.5) 

0.24(0.15-0.40) 


AUDIT > 8 (n = 1) 

0.33 

0.91 

3.6(1.6-8.0) 

0.75 (0.58-0.90) 



Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT Consumption Questions; CAGE, cut down, annoyed, guilty, eye opener; Cl, confidence interval; 
LR+, positive likelihood ratio; LR-, negative likelihood ratio; T-ACE, tolerance, annoyed, cut down, eye opener; TWEAK, tolerance, worry, eye opener, amnesia, cut (/rut) down. 
“The screening questionnaire should be assessed based on the patient population, the threshold that describes positivity, and whether you are screening for “at risk” drinking or 
dependence. 

“Likelihood ratio estimated from summary sensitivity and specificity measures. 
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of the adverse consequences of alcohol, whereas “harmful 
drinking” causes physical or psychological harm that does not 
yet meet the criteria for abuse. 3(pl979) 

About 4.6% of US adults abuse alcohol, with men (6.9%) 
having about 3 times the rate compared to women (2.6%). 4 
An additional 3.8% display alcohol dependence (5.4% of 
men vs 2.3% of women). 

RESULTS OF LITERATURE REVIEW 

EVIDENCE FROM GUIDELINES 

Canadian Task Force on Preventive Health Care 

The Canadian Task Force has not updated their recommen¬ 
dations since 1994, 5 at a time when the CAGE and the MAST 
had the best available data. Screening was recommended, 
although the limitations of these instruments in detecting 
hazardous drinking were recognized. 

Web Resources for Alcohol Screening 

A patient-administered screen: http://www.alcoholscreening. 
org/ (accessed May 17,2008). For clinicians: http://pubs.maaa. 
nih.gov/publications/Practitioner/pocketguide/pocket_guide. 
htm (accessed May 17, 2008). 


CLINICAL SCENARIO—RESOLUTION 


Fortunately, your clinical practice is routinely screen¬ 
ing for alcohol problems. However, it is important to 
know exactly how your patients are being screened. If 
your clinic is using the CAGE questionnaire, you may 
detect most patients with alcohol dependence, but you 
will likely fail to recognize patients who are problem 
drinkers. This is especially true for women because the 
sensitivity for all questionnaires is less compared with 
that for men. In addition to knowing which question¬ 
naire your clinic nurses are using, you need to know 
how to score the results. Accepting a lower score as 
“positive” will improve the sensitivity so that you will 
not miss as many patients with alcohol problems. 
Because the prevalence of alcohol problems is so high, 
it is important not to miss these patients. 

Assuming your patient drinks some alcohol, the nega¬ 
tive LR for alcohol abuse or dependence is 0.18 for adults 
with at least 1 question positive in the CAGE. The sensi¬ 
tivity is better for the AUDIT, but primary care clinics 


might not use the AUDIT because it contains more ques¬ 
tions. If you want to detect potentially harmful or hazard¬ 
ous drinking, it would be good to ask the “Tolerance” 
question from the TWEAK (eg, “How many drinks does it 
take before you begin to feel the first effects of the alco¬ 
hol?”). If the patient answers “at least 3,” then you need to 
assess more fully for problem drinking. 

From a practice management standpoint, you and your 
clinic nurses should review your patient population 
(Table 4-8). If your clinic patients are mostly women, the 
best current screening forms are the TWEAK or the T- 
ACE. No data support the existence of 1 ideal question¬ 
naire applicable to all patients, although making no 
choice of a screening instrument guarantees missed 
opportunities for intervention. If you are using the CAGE 
questions, you may choose to switch to the AUDIT (which 
will detect problem drinking, abuse, and dependence). If 
the AUDIT is too long for your patients, then you could 
select the CAGE, TWEAK, or T-ACE and use a low 
threshold for pursuing follow-up questions. Two alternate 
approaches combine the best features of the AUDIT 
(which detects hazardous drinking but is long) with the 
CAGE (which detects abuse and dependence and is short 
but does not detect problem drinking). The resulting 
AUDIT-C is a shorter questionnaire than the AUDIT (see 
Appendix Table 4-13) and, in one study, appears to have 
the same measurement characteristics as the full AUDIT. 


Table 4-8 US Preventive Health Services Task Force Recommendations 


for Tests in Different Populations 



Population 

AUDIT 

CAGE 

TWEAK or T-ACE 

Risky or Harmful Drinking 

Adults 

Yes 

No 

No 

> 65 y 

Uncertain 

No 

No 

Pregnant 

women 

No 

No 

Yes 

Alcohol Abuse or Dependence 

Adults 

Yes 

Yes 

Yes 

> 65 y 

Uncertain 

Uncertain 

Uncertain 

Pregnant 

women 

No 

No 

Yes 

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, 


annoyed, guilty, eye opener; T-ACE, tolerance, annoyed, cut down, eye opener; 
TWEAK, tolerance, worry, eye opener, amnesia, cut (/rut) down. 
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SCREENING FOR ALCOHOL PROBLEMS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Data from the National Institute on Alcohol Abuse and 
Alcoholism suggest that 3 of 10 adults engage in risky 
drinking behaviors. In primary care clinics, the prevalence 
will be around 11% to 18%. 

POPULATIONS FOR WHOM PROBLEM 
DRINKING SHOULD BE ASSESSED 

• All adults (see Tables 4-9 and 4-10) 


• Targeted populations/conditions requiring assessment 
include pregnant women (see Table 4-11), adolescents, 
and emergency patients 


Table 4-9 Detecting the Likelihood of At-risk, Harmful, or 

Hazardous Drinking in Adults 

LR Range 

AUDIT or AUDIT-C >8 

6.8-12 

AUDIT or AUDIT-C <8 

0.46-0.62 


Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT 
Consumption Questions; LR, likelihood ratio. 


Table 4-10 Detecting the Likelihood of Alcohol Abuse 

or Dependence in Adults 3 

LR (95% Cl) 

CAGE > 1 

3.4 (2.3-5.1) 

CAGE = 0 

0.18(0.11-0.29) 


Abbreviations: CAGE, cut down, annoyed, guilty, eye opener; Cl, confidence inter¬ 
val; LR, likelihood ratio. 

Women have a lower sensitivity than men do but have a higher specificity. A cut 
point of > 1 optimizes the sensitivity and, therefore, the negative LR. 


Table 4-11 Detecting the Likelihood of 2 or More Drinks/Day 

During Pregnancy 3 

LR Range 

TWEAK >2 or T-ACE >1 

3.6-4.0 

TWEAK <1 or T-ACE = 0 

0.12-0.15 


Abbreviations: LR, likelihood ratio; T-ACE, tolerance, annoyed, cut down, eye 
opener; TWEAK, tolerance, worry, eye opener, amnesia, cut (/rut) down. 
a LRs are estimated from studies that have incorporation bias where the interviewer 
knew the results of the screening questionnaires. 


REFERENCE STANDARD TESTS 

Diagnostic interview schedule for Diagnostic and Statistical 
Manual of Mental Disorders, Fourth Edition , 2 interview per¬ 
formed by an experienced provider in an alcohol-related 
interview. 


REFERENCES FOR THE UPDATE 

1. American Psychiatric Association. Diagnostic and Statistical Manual of 
Mental Disorders (DSM-IV). 4th ed. Washington, DC: American Psychi¬ 
atric Association; 2000. 

2. Dufour MC. What is moderate drinking? Alcohol Res Health. 1999;23 
(1 ):5-14. 

3. Fiellen DA, Reid MC, O'Connor PG. Screening for alcohol problems in 
primary care. Arch Intern Med. 2000; 160(13) :1977- 1989. a 

4. Grant BF, Dawson DA, Stinson FS, Chou SP, Dufour MC, Pickering RP. 
The 12-month prevalence and trends in DSM-IV alcohol abuse and 
dependence: United States, 1991-1992 and 2001-2002. Drug Alcohol 
Depend. 2004;74(3):223-234. 


5. Haggerty JL. Early detection and counseling of problem drinking. In: 
Canadian Task Force on the Periodic Health Examination. Canadian Guide 
to Clinical Preventive Health Care. Ottawa, Ontario, Canada: Health Can¬ 
ada; 1994:488-498. http://www.ctfphc.org/sections/section06ch042.htm. 
Accessed May 17,2008. 

6. Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. Alcohol screen¬ 
ing questionnaires in women. JAMA. 1998;280(2):166-171. a 

7. Whitlock EP, Polen MR, Green CA, Orleans CT, Lein JT. Behavioral 
Counseling Interventions in Primary Care to Reduce Risky/Harmful Alco¬ 
hol Use. Rockville, MD: Agency for Healthcare Research and Quality; 
2004. Systematic Evidence Review No. 30. Electronic copies available at 
http://www.ahrq.gov/clinic/3rduspstf/alcohol/alcomissum.pdf (accessed 
May 17,2008). 


a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 




















CHAPTER 4 Problem Alcohol Drinking 


APPENDIX—ALCOHOL SCREENING INSTRUMENTS 67 

Adapted from Whitlock EP, Polen MR, Green CA, Orleans 
CT, Lein JT. Behavioral Counseling Interventions in Primary 
Care to Reduce Risky/Harmful Alcohol Use. Rockville, MD: 


Agency for Healthcare Research and Quality; 2004. System¬ 
atic Evidence Review No. 30. Electronic copies available at 
http://www.ahrq.gov/clinic/3rduspstf/alcohol/alcomissum.pdf 
(accessed, May 17, 2008). 


Table 4-12 AUDIT 

Circle the number that comes closest to your alcohol use in the PAST YEAR. 


1. How often do you have a drink containing alcohol? 

(0) Never 

(1) Monthly or less 

(2) 2 to 4 times a month 

(3) 2 or 3 times a week 

(4) 4 or more times a week 

2. How many drinks containing alcohol do you have on a typical day when you are drinking? 

(0) 1 or 2 

(1)3 or 4 

(2) 5 or 6 

(3) 7 to 9 

(4) 10 or more 

3. How often do you have 6 or more drinks on 1 occasion? 

(0) Never 

(1) Less than monthly 

(2) Monthly 

(3) Weekly 

(4) Daily or almost daily 

4. How often during the last year have you found that you were not able to stop drinking once you had started? 

(0) Never 

(1) Less than monthly 

(2) Monthly 

(3) Weekly 

(4) Daily or almost daily 

5. How often during the last year have you failed to do what was expected from you because of drinking? 

(0) Never 

(1) Less than monthly 

(2) Monthly 

(3) Weekly 

(4) Daily or almost daily 


6. How often during the last year have you needed a first drink in the morning to get yourself going after a heavy drinking session? 


(0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 

7. How often in the last year have you had a feeling of guilt or remorse after drinking? 

(0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 

8. How often during the last year have been unable to remember what happened the night before because you had been drinking? 

(0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 

9. Have you or someone else been injured as a result of your drinking? 

(0) No (1) Yes, but not in the last year (2) Yes, during the last year 

10. Has a relative or friend or a doctor or other health worker been concerned about your drinking or suggested you cut down? 

(0) No (1) Yes, but not in the last year (2) Yes, during the last year 

Abbreviation: AUDIT, Alcohol Use Disorders Identification Test. 

Scoring: A score of 8 or more is considered a positive screen for hazardous or harmful drinking. 


Table 4-13 AUDIT-C 

Circle the number that comes closest to your alcohol use in the PAST YEAR. 

1. How often do you have a drink containing alcohol? Consider a “drink” to be 1 can or bottle of beer, 1 glass of wine, 1 wine cooler, 1 cocktail, or 1 


shot of hard liquor (like scotch, gin, or vodka). 


(0) Never 

(1) Monthly or less (2) 2 to 4 times a month 

(3) 2 to 3 times a week 

(4) 4 or more times a week 

2. How many drinks containing alcohol do you have on a typical day when you are drinking? 

(0) 1 or 2 

(1)3 or 4 (2) 5 or 6 

(3) 7 to 9 

(4) 10 or more 

3. How often do you have 6 or more drinks on 1 occasion? 

(0) Never 

(1) Less than monthly (2) Monthly 

(3) Weekly 

(4) Daily or almost daily 


Abbreviation: AUDIT-C, Alcohol Use Disorders Identification Test Consumption Questions. 
Scoring: A score of 8 or more is considered a positive screen for hazardous or harmful drinking. 
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Table 4-14 CAGE 

1. Have you ever felt you should cut down on your drinking? 

2. Have people annoyed you by criticizing your drinking? 

3. Have you ever felt bad or guilty about your drinking? 

4. Have you ever had a drink first think in the morning to steady your 
nerves or to get rid of a hangover (eye opener)? 

Abbreviation: CAGE, cut down, annoyed, guilty, eye opener. 

Scoring: Two or more positive responses are considered a positive screen for problem 
drinking in most studies. Alternatively, you may select a cut point of just 1 positive 
response to improve the sensitivity. 


Table 4-15 T-ACE 

1. How many drinks does it take to make you feel high (tolerance)? 

2. Have people annoyed you by criticizing your drinking? 

3. Have you ever felt you should cut down on your drinking? 

4. Have you ever had a drink first think in the morning to steady your 
nerves or to get rid of a hangover (eye opener)? 

Abbreviation: T-ACE, tolerance, annoyed, cut down, eye opener. 

Scoring: Positive response to the tolerance item (positive is considered more than 2 
drinks) is scored 2 points; to other items, 1 point each. Total of 2 or more indicates 
risky drinking. 


Table 4-16 TWEAK 

1. How many drinks can you hold? (“Hold” version; > 6 drinks indicates tol¬ 
erance) orhow many drinks does it take before you begin to feel the first 
effects of the alcohol? (“High” version; > 3 indicates tolerance)? 

2. Does your spouse (or do your parents) ever worry or complain about 
your drinking? 

3. Have you ever had a drink first think in the morning to steady your 
nerves or to get rid of a hangover (eye opener)? 

4. Have you ever awakened the morning after some drinking the night 
before and found that you could not remember a part of the evening 
before? (amnesia) 

5. Have you ever felt you ought to cut (/rut) down on your drinking? 

Abbreviation: TWEAK, tolerance, worry, eye opener, amnesia, cut (/rut) down. 

Scoring: Positive responses to the tolerance or worry items score 2 points each; to 

other items, score 1 point each. A total score of 3 or more is considered positive for 

heavy/problem drinking. During pregnancy, it may be more appropriate to consider a 

score of 2 or more as positive. 





































EVIDENCE TO SUPPORT THE UPDATE: 

Problem Alcohol Drinking 



TITLE The Value of the CAGE in Screening for Alcohol 
Abuse and Alcohol Dependence in General Clinical Popu¬ 
lations: A Diagnostic Meta-analysis. 

AUTHORS Aertgeerts B, Buntinx F, Kester A. 

CITATION / Clin Epidemiol. 2004;57(l):30-39. 

QUESTION How well does the CAGE questionnaire (cut 
down, annoyed, guilty, eye opener) perform? 

DESIGN A formal systematic review with meta-analytic 
techniques. 

DATA SOURCES MEDLINE database and MEDION 
database for diagnostic reviews. 

STUDY SELECTION AND ASSESSMENT A search 
for articles published from January 1974 to December 2001 
was conducted, along with a manual search of Dutch-lan¬ 
guage articles. All languages (except Japanese) were included 
in the search. Studies had to be in a general clinical population 
and to report the data required for sensitivity and specificity. 
Studies with verification bias were eliminated, although stud¬ 
ies that adjusted for verification bias were retained. Studies 
outside of general medical practices (eg, psychiatric settings or 
the emergency department) were excluded. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

CAGE questionnaire as compared with the diagnosis estab¬ 
lished by the Diagnostic and Statistical Manual of Mental Dis¬ 
orders criteria. 

OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios (LRs) of the 
CAGE for diagnosis of alcohol abuse or dependence. 

MAIN RESULTS 

Thirty-five articles were identified, but only 10 were in compli¬ 
ance with all the inclusion and exclusion criteria ( ). 


Table 4-17 Serial LR for the CAGE Questionnaire at Each Cut Point for 
Patients From Either Outpatient or Inpatient Settings 


CAGE threshold 

LR+ (95% Cl) 

CAGE = 4 

25 (15-43) 

CAGE > 3 

15(8.2-29) 

CAGE > 2 

6.9(4.2-11) 

CAGE > 1 

3.4 (2.3-5.1) 

CAGE = 0 

0.18(0.11-0.29) 


Abbreviations: CAGE, cut down, annoyed, guilty, eye opener; Cl, confidence inter¬ 
val; LR, likelihood ratio. 


When comparing primary care patients to ambulatory 
medical patients (excluding inpatients), the results for the 
LRs among these groups are clinically similar. While inpa¬ 
tients have positive LRs (confidence intervals [CIs]) that 
overlap at each threshold, the results for the negative LRs dif¬ 
fer. The CAGE has much better sensitivity for inpatients, 
especially at lower thresholds: When patients have no more 
than 1 positive response on the CAGE, the LR is 0.17 (Cl, 
0.11-0.28), and when they answer all the questions negatively, 
the LR is 0.02 (Cl, 0-0.11). 

The authors conclude that the CAGE at a cut point of 2 or 
greater is of limited value. 

CONCLUSION 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS High-quality systematic review with appropri¬ 
ate meta-analytic techniques. The study formulates the 
research question, includes a comprehensive search and selec¬ 
tion of studies, critically appraises the studies and provides the 
results, and incorporates the results into their interpretation. 

LIMITATIONS Users of the CAGE should be careful not to 
extrapolate these data to the diagnosis of hazardous or prob¬ 
lem drinking because the studies evaluated alcohol abuse or 
dependence. 

We see these data as suggesting that the CAGE is more useful 
than do the authors. However, it is very important to recognize 
that the CAGE, with its recommended cut point of CAGE of 2 
or greater, is intended to diagnose alcohol abuse or dependence 


E4-1 












CHAPTER 4 Evidence to Support the Update 


and not lower levels of problem drinking. The CAGE is useful 
for this because getting an affirmative answer greatly increases 
the probability that the person has a problem. On the other 
hand, we agree with the authors that questionnaires with 0 to 1 
positive responses do not sufficiently rule out abuse or depen¬ 
dence, especially in populations with higher prevalence of 
abuse or dependence. 

What about accepting a threshold of only 1 positive response? 
Further studies are needed, but this would be a reasonable 
approach for screening. It should be noted that many patients 
who answer with only 1 positive question will not have an abuse 
or dependence problem, but it is likely that the sensitivity for 
such a question would be much higher for problem drinking 
and you would “miss” fewer patients. For many clinic popula¬ 
tions, the LR of 0.18 when the patient answers in the negative for 
all CAGE questions may not be adequate. This has led many 
clinics to use a combination of the Alcohol Use Disorders Identi¬ 
fication Test (AUDIT; for diagnosing problem drinking) and 
CAGE (for diagnosing abuse or dependence). 

Reviewed by David L. Simel, MD, MHS 


TITLE Screening for Alcohol Abuse and Dependence in 
Older People by Using DSM Criteria: A Review. 

AUTHORS Beullens J, Aertgeerts B. 

CITATION AgingMentHealth. 2004;8(l):76-82. 

QUESTION Which alcohol screening questionnaires 
perform best in older patients? 

DESIGN Formal systematic review without meta-analytic 
techniques. 

DATA SOURCES MEDLINE and PsycINFO databases. 

STUDY SELECTION AND ASSESSMENT Studies 
published from 1996 to 2002. Studies could be inpatient, 
outpatient, or nursing home settings for patients 60 years 
or older. One study of nursing home patients that included 
those as young as 50 years was included. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

CAGE (cut down, annoyed, guilty, eye opener), AUDIT 
(Alcohol Use Disorders Identification Test), MAST (Michi¬ 
gan Alcoholism Screening Test), and variations compared 
with the diagnosis established by the Diagnostic and Statisti¬ 
cal Manual of Mental Disorders (DSM) criteria. We assessed 
the data for the CAGE and the AUDIT because these are 
shorter questionnaires than the longer MAST (see Appendix 
in the Update for the actual questionnaires). 


Table 4-18 Performance of the CAGE Questionnaire Among Older Patients 

Test (No. of 
Studies) 

Sensitivity 

Specificity 

LR+ (95% 
Cl) 

LR- (95% Cl) 

CAGE > 2 

(n = 2 ) 

0.63-0.70 

0.82-0.91 

5.3 (3.0-9.0) 

0.37 (0.29-0.47) 

CAGE > 1 

(n = 2 ) 

0.79-0.86 

0.56-0.78 

2.6(1.5-4.5) 

0.24(0.15-0.40) 

AUDIT > 8 

(n = 1) 

0.33 

0.91 

3.6 (1.6-8.0) 

0.75 (0.58-0.90) 


Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, 
annoyed, guilty, eye opener; Cl, confidence interval; LR+, positive likelihood ratio; 
LR—, negative likelihood ratio. 


OUTCOME MEASURES 

Sensitivity and specificity. The criterion standard assessed for 
alcohol abuse or dependence. We retrieved articles to calcu¬ 
late the LRs from the original data. 

MAIN RESULTS 

Seven articles were identified for inclusion; only 2 were done 
in the outpatient setting, and the results are displayed in 

Table 4-18. 

CONCLUSION 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS The study formulates the research question, 
includes a comprehensive search and selection of studies, and 
provides the results. 

LIMITATIONS There is no meta-analytic assessment. A for¬ 
mal quality assessment is not presented. Confidence intervals 
and sample sizes for the number of patients with alcohol 
abuse or dependence are not given. 

The number of studies on drinking problems in older 
patients is disappointingly low. The authors provide a good 
rationale for why the existing questionnaires might not work 
as well in older patients. The authors’ impression is that the 
CAGE may be better for detecting alcohol abuse or depen¬ 
dence in older patients, which would be consistent with other 
studies about the use of the CAGE, but it is hard to be conclu¬ 
sive given the paucity of studies in ambulatory older patients. 
As in other studies, picking a threshold of just 1 or more posi¬ 
tive answer to CAGE questions improves the sensitivity. The 
authors hypothesize that the T-ACE (tolerance, annoyed, cut 
down, eye opener) might be even more efficient than the 
CAGE because the “feeling guilty” question is replaced by a 
“tolerance” question that may be more appropriate for older 
patients. That hypothesis, along with assessing the proper 
threshold, needs assessment. The authors do not address the 
detection of harmful or hazardous drinking in older patients. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Alcohol Screening Questionnaires in Women. 

AUTHORS Bradley KA, Boyd-Wickizer J, Powell SH, 
Burman ML. 

CITATION JAMA. 1998;280(2):166-171. 

QUESTION Which alcohol screening questionnaires 
perform best in women? 

DESIGN Formal systematic review without meta-analytic 
techniques. 

DATA SOURCES MEDLINE database and Social Sci¬ 
ence and Science Citations Index. 

STUDY SELECTION AND ASSESSMENT Studies 
published from 1996 to July 1997 in English. Studies did 
not have to be performed in a general clinical population 
but did need to include a clinic population of women with 
the data reported separately for women. United States stud¬ 
ies were the only studies included. All studies had to com¬ 
pare a brief screening questionnaire to a criterion standard. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

CAGE (cut down, annoyed, guilty, eye opener), AUDIT 
(Alcohol Use Disorders Identification Test), TWEAK (toler¬ 
ance, worry, eye opener, amnesia, cut [kut] down), Brief 
Michigan Alcohol Screening Test (BMAST), T-ACE (toler¬ 
ance, annoyed, cut down, eye opener), Trauma score, and 
NET* questionnaires 1 as compared with the diagnosis estab¬ 
lished by the Diagnostic and Statistical Manual of Mental Disor¬ 
ders criteria. In the obstetrics clinic studies, the criterion 
standard was the number of drinks per day, which is appropri¬ 
ate, given that any drinking may be harmful. In the primary care 
clinics studies, the criterion was alcohol abuse or dependence. 
We assessed the data only for the CAGE, AUDIT, TWEAK, and 
T-ACE for this review as these were the surveys studied in more 
than 1 location. In all studies, a person who was aware of the 
questionnaire results applied the criterion standard. 

OUTCOME MEASURES 

Sensitivity, specificity, and area under the receiver operating 
characteristic curve. Data were presented for women com¬ 
pared with men when the results were available. 

MAIN RESULTS 

Thirty-six articles were identified, but only 13 met all the 
inclusion criteria. 


*NET stands for: N, Normal drinker: Do you feel you are a normal 
drinker?; E, “Eye opener” question from CAGE.; T, Tolerance: How 
many drinks does it take to make you feel high. These questions are 
found in the other questionnaires. 


Table 4-19 Performance Characteristics of Screening Questionnaires 
in Women 

Setting (No. 


of Studies, 

No. of Patients) 

Test 

Sensitivity 
(95% Cl) 

Specificity 
(95% Cl) 

Emergency Care (3 studies, 892 patients) 

Low cut point 

TWEAK >2 
(Hold version; 
only in 1 study) 

0.87 (0.74-0.93) 

0.87 (0.83-0.90) 

Higher cut point 

CAGE >2, AUDIT 
>8, TWEAK >3 
(Hold version) 

0.72 (0.66-0.77) 

0.94 (0.92-0.96)“ 

Obstetrics Clinic (3 studies, 8431 patients) 

Low cut point 

T-ACE >1 

0.89(0.81-0.94)“ 

0.75 (0.70-0.79)“ 


CAGE >1 

0.66 (0.62-0.70) 

0.81 (0.81-0.82) 


TWEAK > 2 
(Hold version; 
only 1 study) 

0.91 (0.87-0.94) 

0.77 (0.76-0.78) 

Higher cut point 

T-ACE > 2 

0.79 (0.64-0.90)“ 

0.82(0.71-0.90)“ 


CAGE > 2 

0.48 (0.44-0.53) 

0.93 (0.92-0.93) 


TWEAK >3 
(Hold version; 
only 1 study) 

0.67(0.61-0.73) 

0.92(0.91-0.93) 

Primary Care (2 studies, 758 patients) 

Low cut point 

CAGE >1 
(only 1 study) 

0.89 (0.82-0.93) 

0.83 (0.79-0.86) 

Higher cut point 

CAGE > 2 

0.58 (0.32-0.80)“ 

0.93 (0.90-0.95) 


Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, 
annoyed, guilty, eye opener; Cl, confidence interval; T-ACE, tolerance, annoyed, cut 
down, eye opener; TWEAK, tolerance, uorry, eye opener, amnesia, cut (Aut) down. 
“Heterogeneous, P< .05. 

We extracted the data for sensitivity and specificity to assess 
for summary values (Table 4-19). The results are the random 
effects summary measures when there is more than 1 study. 

We combined data for the sensitivity and specificity estimates 
by extracting the raw results. Because of concerns about incor¬ 
poration bias, we assessed for heterogeneity. We chose not to 
report summary likelihood measures for women because of 
our uncertainty about the effect of incorporation bias. 

The summary specificity for the CAGE of 2 or greater, 
AUDIT, TWEAK of 3 or greater, and T-ACE of 2 or greater is 
0.92 (95% Cl, 0.90-0.94), has narrow CIs, and suggests that a 
positive questionnaire at these thresholds is clinically similar 
no matter what population of women is included. 

There is greater variability for the sensitivity. The CAGE 
questionnaire performs poorly in an obstetrics clinic. The 
AUDIT and the TWEAK of 3 or greater (hold version) have 
similar sensitivities across all settings (0.69 [95% CI, 0.64- 
0.74]). For every questionnaire studied (CAGE, AUDIT, and 
TWEAK), the sensitivity is always worse in women compared 
with men, whereas the specificity is always higher for women. 

CONCLUSION 

LEVEL OF EVIDENCE Systematic review. 
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STRENGTHS High-quality systematic review. The study 
formulates the research question, includes a comprehensive 
search and selection of studies, critically appraises the studies 
and provides the results, and incorporates the results into 
their interpretation. 

LIMITATIONS There is no meta-analytic assessment. The 
studies in the emergency department and in primary care 
assessed only for abuse or dependence rather than for harmful 
or hazardous drinking. Each study was potentially affected by 
incorporation bias in which the interviewer knew the results of 
the screening questionnaires. We are uncertain how this 
affected the interpretation of the criterion standard. However, 
because all studies were affected by this bias, we still may make 
inferences on the relative value of the sensitivity and specificity. 

No matter what the setting, the specificity of these tests is 
similarly high for women. Although it is possible that this 
uniformly good measurement property is a function of 
incorporation bias, it is also plausible that women with any 
positive screen result for alcohol are highly likely to be prob¬ 
lem drinkers. 

The results from the individual studies cited by these 
authors suggest poorer overall performance for the CAGE 
among women. Compared with the overall data in the meta¬ 
analysis by Aertgeerts et al, 2 the estimated positive likelihood 
ratio (LR) for women with a CAGE of 2 or greater appears to 
be the same (an estimated positive LR of 8.2 in women vs the 
meta-analytic summary estimate of 6.9 by Aertgeerts et al 2 ), 
but the estimated LR of 0.45 does appear worse (summary 
positive LR 0.33 [95% Cl, 0.25-0.43]). A study published just 
after this systematic review also suggested CAGE differences 
between men and women, along with differences based on 
race or country of origin. 3 In that study, the sensitivity of the 
CAGE for white women and black women fell within the Cl 
of that in the systematic review by Aertgeerts et al 2 but was 
less for Hispanic women. The AUDIT had a better sensitivity 
among all 3 groups of women studied. 

The TWEAK and T-ACE were developed to detect alcohol 
problems during pregnancy, so they ought to work better than 
the CAGE for pregnant women. However, the TWEAK and T- 
ACE have not been as widely studied in primary care clinics. 

The authors conclude that the TWEAK and AUDIT may 
be the best screening tests for women in any setting. They 
recommend a cut point of 2 or greater for the TWEAK, 
which does improve the sensitivity but was reported in only 1 
study. Although the specificity is worse for the TWEAK of 2 
or greater, this is not as an important an issue as failing to 
diagnose alcohol misuse during pregnancy. Dropping the cut 
point for the CAGE to 1 or greater improves the sensitivity, 
but it still does not perform as well as the TWEAK. 

Our assessment is that the TWEAK does have statistically 
similar sensitivity to the AUDIT, with a narrow Cl, and these 
appear to perform better than the CAGE. The TWEAK has the 
obvious advantage over the AUDIT in that it requires fewer 
questions. The “hold” version of the TWEAK has been studied 
more extensively than the “high” version (see Appendix in the 
Update for the actual questionnaires), but in the single study 
that compared them, the results were similar. Because many 


women may never have passed out from alcohol, the authors 
recommend using the high version of the TWEAK with the 
question, “How many drinks does it take before you begin to 
feel the first effects of the alcohol?” (> 3 drinks indicates toler¬ 
ance). They also recommend a cut point of 2 or greater as indi¬ 
cating positivity. They suggest this lower threshold because the 
improved sensitivity, especially for pregnant women, would be 
more important than a higher specificity. 

The T-ACE should be studied further because it has fewer 
questions. It may be easier for primary care clinics to imple¬ 
ment it because it is similar to the CAGE except that the “Feel¬ 
ing guilty” question is replaced by the “Tolerance” question. 

Reviewed by David L. Simel, MD, MHS 

REFERENCES FOR THE EVIDENCE 

1. Russell M, Martier SS, Sokol RJ, et al. Screening for pregnancy risk¬ 
drinking. Alcohoi Clin Exp Res. 1994; 18(5): 1156-1161. 

2. Aertgeerts B, Buntinx F, Kester A. The value of the CAGE in screening 
for alcohol abuse and alcohol dependence in general clinical popula¬ 
tions: a diagnostic meta-analysis. / Clin Epidemiol. 2004;57(l):30-39. 

3. Steinbauer JR, Cantor SB, Holzer CE, Volk RJ. Ethnic and sex bias in pri¬ 
mary care screening tests for alcohol use disorders. Ann Intern Med. 
1998;129(5):353-362. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

AUDIT (Alcohol Use Disorders Identification Test), CAGE (cut 
down, annoyed, guilty, eye opener), and SMAST (Short Michi¬ 
gan Alcoholism Screening Test) instruments for screening for 
alcohol problems compared with the Diagnostic and Statistical 
Manual of Mental Disorders as the criterion standard. 


TITLE Screening for Alcohol Problems in Primary Care. 

AUTHORS Fiellin DA, Reid MC, O’Connor PG. 

CITATION Arch Intern Med. 2000;160(13):1977-1989. 

QUESTION Which alcohol screening questionnaires 
perform best in primary care patients? 

DESIGN Formal systematic review. 

DATA SOURCES MEDLINE database. 

STUDY SELECTION AND ASSESSMENT Studies 
published in 1996-1998, English language, primary care 
setting, comparing a screening questionnaire to a crite¬ 
rion standard and including the sensitivity, specificity, or 
likelihood ratios (LRs). An assessment for evaluation bias 
or incorporation bias whereby the results of the screening 
test were used in the criterion standard and an analysis of 
clinical subgroups was done for each article. 
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OUTCOME MEASURES 

Adherence to quality standards of reporting the demograph¬ 
ics, comorbidities, eligibility criteria and participation rate, 
criterion standard, blinding, and analysis of subgroups was 
presented for 38 studies. Sensitivity and specificity were pre¬ 
sented without their confidence intervals (CIs). Meta-analytic 
techniques were not used. 

MAIN RESULTS 

Eleven articles assessed at-risk, hazardous, or harmful drinking, 
whereas 27 articles studied alcohol dependence or abuse. The 
result for the SMAST was found in only 1 retrieved study. 

20 includes the data only from studies that met standards for 
avoiding evaluation and incorporation bias. The sensitivity and 
specificity are the point estimates (single study) or ranges 
reported in the review. We retrieved the original articles to obtain 
the data for combining the results to get a summary LR for the 
AUDIT. We calculated the summary LR CIs for the AUDIT and 
AUDIT-C (AUDIT Consumption Questions) from the original 
data. (For alcohol abuse or dependence, a separate systematic 
review with a meta-analysis was used to combine the results. 1 The 
sensitivity and specificity values of the studies without verifica¬ 
tion bias cited in the publication are shown for comparison pur¬ 
poses to the AUDIT.) 

CONCLUSION 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS This is an excellent systematic review that for¬ 
mulates the research question, includes a comprehensive 


Table 4-20 Performance Characteristics of 

Screening Questionnaires in Primary Care 



Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

At-Risk, Harmful, or Hazardous Drinking 

AUDIT >8 23 

0.57-0.59 

0.91-0.96 

6.8(4.7-10) 

0.46 (0.38-0.55) 

AUDIT-C >8 2(p19741 

0.40 

0.97 

12(5.0-30) 

0.62 (0.52-0.74) 

CAGE >2 a -w 

0.49-0.69 

0.75-0.95 

3.4(1.2-10) 

0.66(0.54-0.81) 

CAGE >2 4 
(patients all >60 y) 

0.14 

0.97 

4.7 (3.7-6.0) 

0.89(0.86-0.91) 

Current Abuse/Dependence 

AUDIT >8 2(p1974) ' 3(p385) 

0.66-0.71 

0.85-0.86 

4.6 (3.5-6.1) 

0.37 (0.28-0.49) 

AUDIT-C >8 2|p19741 

0.46 

0.92 

5.9(3.3-10) 

0.58 (0.44-0.73) 

CAGE >2 3|p3851 

0.77 

0.79 



Lifetime Abuse Dependence 

AUDIT >8 3(p385) 

0.39 

0.89 

7.0 

0.46 (0.36-0.58) 

CAGE >2 3|p385) ’ 6 

0.43-0.53 

0.86 




Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT Con¬ 
sumption Questions; CAGE, cut down, annoyed, guilty, eye opener; Cl, confidence 
interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“Source for sensitivity: Aithal et al. 5 


search and selection of studies, critically appraises the studies 
and provides the results, and incorporates the results into 
their interpretation. 

LIMITATIONS There is no meta-analytic assessment of the 
AUDIT and CAGE. This makes the results a bit harder for the 
clinician to detect differences in the performance characteris¬ 
tics of these questionnaires. 

The authors evaluated the sensitivity and specificity ranges 
to conclude that the AUDIT is best at identifying at-risk, haz¬ 
ardous, or harmful drinking. We retrieved the original 
reports to calculate the LRs. The CAGE appears inferior to 
the AUDIT for detecting at-risk, harmful, or hazardous 
drinking. However, a pragmatic problem occurs with the 
AUDIT in that it is much longer than the CAGE (10 ques¬ 
tions vs 4). We retrieved the data from the AUDIT-C, which 
is a shorter version of the AUDIT, and it compares favorably 
to the AUDIT for diagnosing hazardous drinking, although it 
may not be as good for ruling out the problem. Because a 
subsequent systematic review performed a meta-analysis of 
the CAGE, we did not use this study to combine those data. 

Reviewed by David L. Simel, MD, MHS 


REFERENCES FOR THE EVIDENCE 

1. Aertgeerts B, Buntinx F, Kester A. The value of the CAGE in screening 
for alcohol abuse and alcohol dependence in general clinical popula¬ 
tions: a diagnostic meta-analysis./ Clin Epidemiol. 2004;57(l):30-39. 

2. Bush K, Kivlahan DR, McDonell MB, Fish SD, Bradley KA. The AUDIT 
alcohol consumption questionnaires (AUDIT-C): an effective brief screen¬ 
ing test for problem drinking. Arch Intern Med. 1998;158(16):1789-1795. 

3. Bradley KA, Bush KR, McDonnell MB, Malone T, Fihn SD. Screening for 
problem drinking, comparison of CAGE and AUDIT. / Gen Intern Med. 
1998;13(6):379-388. 

4. Adams WL, Barry KL, Fleming MF. Screening for problem drinking in 
older primary care patients. JAMA. 1996;276(24):1964-1967. 

5. Aithal GP, Thornes H, Dwarakanath AD, Tanner AR. Measurement of 
carbohydrate deficient trasferrin (CDT) in a general medical clinic: is 
this test useful in assessing alcohol consumption? Alcohol Alcohol. 
1998;33(3):304-309. 

6. Fleming MF, Barry KL. A three-sample test of a masked alcohol screen¬ 
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TITLE Behavioral Counseling Interventions in Primary 
Care to Reduce Risky/Harmful Alcohol Use. 

AUTHORS Whitlock EP, Green CA, Polen MR. 

CITATION Contract No. 290-92-0018, Task No. 2, Tech¬ 
nical support of the US Preventive Services Task Force, 
March 2004. http://www.ncbi.nlm.nih.gov/books/bv.fcgi? 
rid=hstat3.chapter.45217. Accessed May 17,2008. 

QUESTION Which screening questionnaires for risky 
alcohol use among primary care patients identify those 
who might benefit from brief interventions? 

DESIGN Formal systematic review without meta-ana- 
lytic techniques. 

DATA SOURCES MEDLINE, Cochrane, Psychlnfo, 
HealthSTAR, and CINAHL databases. 

STUDY SELECTION AND ASSESSMENT The goal 
was to identify new literature since the last US Preventive 
Services Task Force recommendations 1 ; thus, articles were 
sought from 1994 through April 2002. An extensive search 
was conducted to identify all relevant articles. Studies had 
to have been conducted in primary care settings (emer¬ 
gency care and inpatient studies were excluded). The 
study quality for all included and excluded articles is 
included. In addition to reviewing primary data, the 
authors reviewed other systematic reviews. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The focus of this review was on brief treatment interventions 
for problem drinkers. The shorter questionnaires were used 
in the studies that were included: CAGE (cut down, annoyed, 
guilty, eye opener), AUDIT (Alcohol Use Disorders Identifi¬ 
cation Test), TWEAK (tolerance, worry, eye opener, amnesia, 
cut [kut] down), and T-ACE (tolerance, annoyed, cut down, 
eye opener). 

OUTCOME MEASURES 

Screening yield, sensitivity, and specificity. 

MAIN RESULTS 

Twelve studies were included in the review for assessing 
screening of primary care patients who might be enrolled in 
brief treatment intervention ( >le4-21). 

The initial yield of screening primary care patients for all 
levels of drinking who are waiting for appointments is 11% 
to 18%. After further questioning, about 7% of primary care 
patients are candidates for brief treatment interventions. In 
trying to identify all patients with drinking disorders, the 
higher value of 11% to 18% would be the appropriate preva¬ 
lence for adult US patients. 


Table 4-21 Screening Questionnaires for Risky Alcohol Use Should 
Be Selected According to the Patient Population 


Population 

AUDIT 

CAGE 

TWEAK or T-ACE 

Risky or Harmful Drinking 

Adults 

Yes 

No 

No 

>65 y 

Uncertain 

No 

No 

Pregnant women 

No 

No 

Yes 

Alcohol Abuse or Dependence 

Adults 

Yes 

Yes 

Yes 

>65 y 

Uncertain 

Uncertain 

Uncertain 

Pregnant women 

No 

No 

Yes 


Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, 
annoyed, guilty, eye opener; T-ACE, tolerance, annoyed, cut down, eye opener; 
TWEAK, tolerance, worry, eye opener, amnesia, cut (/cut) down. 


CONCLUSION 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS This is an excellent systematic review that formu¬ 
lates the research question, includes a comprehensive search and 
selection of studies, critically appraises the studies and provides 
the results, and incorporates the results into their interpretation. 

LIMITATIONS There is no meta-analytic assessment. Confi¬ 
dence intervals and LRs are not presented. The studies 
included in this review were selected because they included 
randomized trials of patients suitable for brief interventions 
for problem drinking. Thus, these were not specifically stud¬ 
ies of the diagnostic tests themselves. To determine the per¬ 
formance characteristics of screening tests, the authors also 
used published systematic reviews of the questionnaires. 

According to data from systematic reviews of diagnostic tests, 
these authors conclude that the AUDIT is the best test for detect¬ 
ing risky harmful drinking in adults, although the TWEAK or 
T-ACE ought to be used for pregnant patients. For detecting 
alcohol abuse or dependence, they conclude that any of the 4 
questionnaires is suitable other than during pregnancy. 

The CAGE questionnaire is in widespread use, so the authors 
suggest that it might be improved by adding quantity/frequency 
questions. This has shown greater sensitivity and specificity in 
the emergency department but has not been studied in primary 
care. 2 It is available online as part of the National Institute on 
Alcoholism and Alcohol Abuse guide to physicians (http:// 
pubs.niaaa.nih.gov/publications/Practitioner/pocketguide/ 
pocket_guide.htm, accessed May 17, 2008) and also as a self- 
graded patient form (http://www.alcoholscreening.org/, 
accessed May 17,2008). 

Reviewed by David L. Simel, MD, MHS 
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CLINICAL SCENARIO 


CHAPTER 


Does This Adult Patient 
Have Appendicitis? 

James M. Wagner, MD 
W. Paul McKinney, MD 
John L. Carpenter, MD 


A 29-year-old patient presents to your office with abdomi¬ 
nal pain and a fever. The patient was well until 1 day ago and 
had never experienced abdominal pain. A vague periumbili¬ 
cal pain awoke him from sleep 12 hours previously, and he 
soon developed anorexia, nausea, and vomiting. His wife 
consulted their family medical reference guide and then 
brought him to the office, concerned that his symptoms 
matched a description of appendicitis. The pain then 
migrated to the right lower quadrant (RLQ) and was much 
worse while he was riding in the car to the physician’s office. 

The patient’s oral temperature is 37.8°C; the pulse rate 
and blood pressure are normal. He has RLQ tenderness, 
guarding but not rigidity, and rebound tenderness in the 
RLQ. A rectal examination reveals no tenderness, and he 
does not exhibit the psoas or obturator signs. Rovsing sign 
is positive. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


In western countries, appendicitis represents a common cause 
of acute abdominal pain. According to National Center for 
Health Statistics data, approximately 500000 patients under¬ 
went appendectomies from 1979 to 1984. Individuals carry a 
7% lifetime risk of developing appendicitis. 1 The incidence of 
appendicitis causing abdominal pain depends on the clinical 
setting. In series from emergency departments or surgical ser¬ 
vices, 25% of patients younger than 60 years and evaluated for 
acute abdominal pain have acute appendicitis, whereas the 
incidence in those older than 60 years is approximately 4%. 1-5 
Only 0.7% to 1.6% of all ambulatory patients with abdominal 
pain have appendicitis. 6,7 Among children treated in the ambu¬ 
latory care setting, appendicitis causes 2.3% of all abdominal 
pain episodes. 8 In children admitted for acute abdominal pain, 
appendicitis is the etiology for approximately 32%. 911 

The morbidity and mortality of appendicitis remain sig¬ 
nificant, even with the advent of antibiotics and effective sur¬ 
gical management. Although the overall mortality rate with 
appropriate treatment is less than 1%, in the elderly it 
remains approximately 5% to 15%. 2,4 There is a significant 
amount of morbidity caused by appendiceal rupture. 1215 The 
incidence of perforation in patients with appendicitis ranges 
from 17% to 40%, with a median of 20%. 16,17 The perforation 
rate is significantly higher in the elderly, with rates as high as 
60% to 70%. Several factors contribute to the increased inci¬ 
dence of perforation in the elderly, including significant 
delay in seeking care, nonspecificity of the presenting symp¬ 
toms and signs, diminished febrile response, and fewer 
abnormalities in important laboratory characteristics such as 
the white blood cell count (WBC). 2,3,5,14,18,19 Children also have 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 








CHAPTER 5 The Rational Clinical Examination 


an increased incidence of perforation because of delays in 
consulting a physician for abdominal pain. 8 The negative lap¬ 
arotomy result rate in most series ranges from 15% to 35% 
and creates morbidity. 16,17,20 ' 22 In younger women, the nega¬ 
tive laparotomy result rate is significantly higher (up to 45%) 
because of the prevalence of pelvic inflammatory disease and 
other common obstetric and gynecologic disorders. 16,17,23,24 

THE ACCURACY OF OTHER DIAGNOSTIC MODALITIES 

Routine medical history and physical examination remain the 
most effective and practical diagnostic modalities. 25,26 Several 
other clinical methods for diagnosing appendicitis have been 
studied. Computer or algorithm-driven analyses of patients 
with abdominal pain have been evaluated, 27 ' 35 although most 
studies have incomplete controls and yield inconsistent results. 
Thus, the utility of computer-guided diagnosis compared with 
unassisted clinical diagnosis needs further evaluation. The 
authors of most of these studies believe that the improved util¬ 
ity they demonstrated was primarily because clinicians were 
forced to focus on specific clinical data that were readily avail¬ 
able to be entered into the analysis tree. Finally, these authors 
observed that all of these modalities completely depend on the 
accuracy of the data gathered and interpreted by clinicians 
before the data are entered into the computer or algorithm 
analysis. The concept of an extended period of observation of 
patients with questionable appendicitis has been shown by 
some authors to be helpful. 8,27,28 Its utility, like that of computer 
and algorithm analyses, depends on routine medical history 
and physical examination skills of clinicians. 

The utility of radiographic techniques has also been evalu¬ 
ated. Plain abdominal radiographs and barium enemas are nei¬ 
ther specific nor sensitive for appendicitis. 36 Ultrasonography is 
more effective in detecting a distended appendix than appen¬ 
diceal perforation. 10,15,36 ' 44 No study has demonstrated ultra¬ 
sonography to be clearly superior to the clinical examination, 
and many authors believe that its primary utility is to supple¬ 
ment the medical history and physical examination in patients 
with equivocal findings. The accuracy of computed tomography 
in diagnosing appendicitis has also been inconsistent. 36,42,43 

Laparoscopy has been shown by some authors to be useful, 
particularly in young women in whom it can be difficult to dif¬ 
ferentiate between pelvic inflammatory disease, ectopic preg¬ 
nancy, and appendicitis. 27 However, other series have not been 
as supportive, with negative appendectomy result rates from 
20% to 30%. 44,45 Studies of outcomes comparing laparoscopy 
with laparotomy have yielded conflicting results. 46,47 Even 
though ultrasonography, computed tomography, and laparos¬ 
copy can be helpful, none are ideal techniques, and the clini¬ 
cian must depend on patient medical history and physical 
examination results. 

APPENDICEAL ANATOMY AND 
PATHOPHYSIOLOGY OF APPENDICITIS 

The adult’s appendix averages 10 cm in length, arising from 
the posteromedial wall of the cecum, about 3 cm below the 


ileocecal valve. 48 Its position in the abdominal cavity is vari¬ 
able, being described as retrocecal, retroileal, preileal, subce¬ 
cal, or pelvic, and this variability in location may influence 
the clinical signs and symptoms associated with appendicitis. 
Although the physiologic role of the appendix is unproved, 
an immunologic function is suggested by its content of lym¬ 
phoid tissue. 49 

Appendiceal obstruction, followed by secondary bacterial 
invasion, causes the majority of appendicitis. Continued 
fluid secretion by the mucosa of the obstructed appendix dis¬ 
tends the lumen, eventually exceeding venous pressure and 
leading to tissue ischemia and, ultimately, necrosis. Causes of 
obstruction include fecaliths, calculi, tumors, parasites, for¬ 
eign bodies, or, rarely, barium. In the one-third of patients 
without apparent obstruction, infection by viruses, parasites, 
or bacteria, or either trauma or postoperative fecal stasis may 
be involved. 50 ' 55 

Normally, appendicitis presents with a highly characteristic 
sequence of symptoms and signs. 56 Initially, appendicitis 
causes visceral pain poorly localized to the epigastrium or 
periumbilical region, presumably because of distention of the 
appendix. Anorexia, nausea, and vomiting soon follow as this 
pathophysiology worsens. More advanced inflammation causes 
irritation of adjacent structures or the peritoneum, low-grade 
fever, and peritoneal pain localized to the RLQ. The patho¬ 
physiology explains the classic migration of pain caused by 
appendicitis. The point of maximal tenderness may be distinct 
from McBurney point, 5 cm from the anterior superior iliac 
spine on a line running from the umbilicus. 

Atypical locations of the appendix may lead to unusual 
clinical findings. In the case of retrocecal or retroiliac appen¬ 
dices, 57,58 the pain may be poorly localized and may not 
undergo the transition from epigastric to RLQ locations. Pel¬ 
vic appendicitis frequently causes pain in the left lower quad¬ 
rant, with an absence of tenderness, and is reflected by 
increased pain during a rectal examination. Unusual symp¬ 
toms of urinary and defecation urgency, caused by irritation 
of the ureter and rectum, respectively, plus dysuria and diar¬ 
rhea may also occur. 

Although often a diagnostic dilemma in the first trimester 
of pregnancy because of confusion with other diagnoses, 
appendicitis in later stages of gestation may present a chal¬ 
lenge for the clinician because of displacement of the appen¬ 
dix by the enlarging uterus. In such cases, periumbilical or 
right subcostal tenderness may be found. 


HOW TO ELICIT THE RELEVANT 
SYMPTOMS AND SIGNS 

Pain is commonly the first symptom of appendicitis. 9,59 Clas¬ 
sically, the vague, midepigastric or periumbilical pain awak¬ 
ens the patient from sleep but is not initially severe. After 
reaching its peak in around 4 hours, it diminishes and then 
migrates to the RLQ. Most patients will seek medical atten¬ 
tion within 12 to 48 hours. Pain usually occurs before vomit¬ 
ing, and the patient has usually not experienced similar 
symptoms before the present episode. 
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According to Cope’s Early Diagnosis of the Acute Abdomen, 60 
many patients feel constipated and anticipate that defecation 
will relieve discomfort, leading them to use cathartic agents. 
However, pain persists after a bowel movement. 

Many signs have been associated with appendicitis or peri¬ 
tonitis. Some of obvious value, such as the pelvic examina¬ 
tion, have not been adequately evaluated to merit mention in 
this systematic review or they lack an adequate description or 
standardization of the elicitation of the sign to ensure accu¬ 
rate reproduction. A common reference for definitions in the 
best studies is a text by De Dombal. 61 What follows is the 
most consistent and useful description of the signs: 

• Guarding: Guarding is a state of voluntary contraction of 
the abdominal muscles. The muscles are held tense by the 
patient because he or she knows (or fears) that further 
examination is likely to be painful. Fear can be partially, or 
fully, overcome by tact and persuasion. 61 

• Rigidity: Rigidity is also known as involuntary guarding. The 
best studies of abdominal pain have described rigidity as an 
involuntary reflex spasm of the muscles of the abdominal 
wall. It can never be overcome by tact and reassurance. 61 

• Rebound tenderness: (1) Press on the area of question with 
the flat of your hand, sufficient to depress the peritoneum. 
The patient should be experiencing pain. (2) Keep pressing 
with a constant intensity. As the patient adjusts to this 
pressure during 30 to 60 seconds, the pain diminishes. It 
may go away completely, although usually it does not. 
(3) Without warning, and preferably while the patient’s 
attention is distracted, remove the hand suddenly to just 
above skin level. Watching the patient grimace is more 
indicative than a complaint of pain. 61 

• Rovsing sign: A sign related to the rebound tenderness test. 
Press deeply and evenly in the left lower quadrant and then 
release pressure suddenly. The presence of tenderness in the 
RLQ during palpation or referred rebound tenderness in the 
RLQ during release is considered a positive Rovsing sign. 

• Psoas sign: With the patient in the supine position, ask the 
patient to lift the thigh against your hand, placed just 
above the knee. Alternatively, with the patient in the left 
lateral decubitus position (Figure 5-1), extend the patient’s 
right leg at the hip. Increased pain with either maneuver is 
a positive sign and indicates irritation of the psoas muscle 
by an inflamed appendix. 

• Obturator sign: This sign is similar mechanically to the 
psoas sign. It is elicited by passively flexing the right hip 
and knee and internally rotating the leg at the hip, stretch¬ 
ing the obturator muscle (Figure 5-2). Resultant right¬ 
sided abdominal pain is a positive sign, indicating irrita¬ 
tion of the obturator muscle. The obturator sign has not 
been studied independent of the psoas sign, but most clini¬ 
cians would attribute the same significance. 

• Rectal examination: Classically, tenderness and fullness 
perceived on the right but not the left side on rectal exami¬ 
nation are indicative of a pelvic appendicitis. 60 This sign is 
subjective and poorly described in most major physical 
examination texts. No studies that assess rectal tenderness 
describe the examination technique. 


PRECISION OF THESE SYMPTOMS AND SIGNS 

There have been no studies published evaluating the preci¬ 
sion of the clinical examination for appendicitis. A standard¬ 
ized clinical examination might produce strong interrater 
reliability. 



Figure 5-1 The Psoas Sign in Examination for Appendicitis 

The sign can be elicited with 2 different patient positions. First, with the 
patient in the supine position, ask the patient to lift the right thigh against 
your hand placed just above the knee. With the patient in the left lateral 
decubitus position (as shown), extend the right leg at the hip. Increased pain 
with either maneuver is a positive sign and indicates irritation of the psoas 
muscle by an inflamed appendix. 



Figure 5-2 The Obturator Sign in Examination for Appendicitis 

Elicit this sign by passively flexing the patient's right hip and knee and inter¬ 
nally rotating the leg at the hip, stretching the obturator muscle. Resultant 
right-sided abdominal pain is a positive sign, indicating irritation of the obtu¬ 
rator muscle. 
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ACCURACY OF THESE SYMPTOMS AND SIGNS 

A handful of studies published during the past few decades 
have evaluated the accuracy of the clinical presentation of 
appendicitis. The studies are of various quality and design. 
Most are best described as cross-sectional in design because 
a clinical judgment is made, with outcomes measured in 
terms of pathologic confirmation of appendicitis vs a nega¬ 
tive laparotomy result or no requirement for surgery. 
Eleven of the highest-quality studies, based on number of 
patients studied, the study design, and completeness of 
reported data, are summarized in Table 5-1 A 24 > 33i35>&67 Th e 
search strategy for identifying these articles is available 
from the authors on request. This strategy yielded about 
300 articles since 1966. Further limiting sets to adult age 
groups yielded 200 studies. The titles and abstracts were 
reviewed and chosen if adequate detail of the outcomes and 
aspects of the clinical examination allowed construction of 
2x2 tables and subsequent calculation of likelihood ratios 
[LRs]. 

The 11 studies were divided into 2 groups by the patients 
on whom they focused. Approximately half of the studies 
focused on patients in whom appendicitis was suspected, and 
half, on those who were examined for acute abdomen. In the 
studies of suspected appendicitis, the inclusion criteria were 
not further defined. In the studies of acute abdomen, inclu¬ 
sion criteria usually involved pain for less than 1 week. Taken 
together, the studies report on the findings of more than 
4000 patients and provide the best available evidence sup¬ 
porting the most valuable aspects of the clinical examination 
for appendicitis (Table 5-2). 

Each study reports on a varying constellation of clinical 
findings. Many aspects of the clinical examination are not 
evaluated in all of these studies. Unfortunately, some of the 
aspects evaluated are poorly defined in the text of the studies, 


so specific recommendations for these aspects are difficult to 
derive for medical education or the everyday practice of 
medicine. 

Nonetheless, several points can be drawn from a system¬ 
atic literature review. In evaluation of patients presenting 
with emergency and acute abdominal pain, usually defined 
as less than 1 week in duration before presenting to an 
emergency department or surgical ward, the prevalence 
(pretest probability) of acute appendicitis ranges from 12% 
to 26%. 12 ' 30,32 ' 69 The clinical examination will influence this 
probability further. If various aspects of the clinical exami¬ 
nation are viewed as diagnostic tests, LRs 70,71 and posttest 
probability can be calculated. 

From the medical history, 6 aspects have been evaluated. 
Seven physical examination items have also been studied 
well. These aspects are examined further in Table 5-3. 72 The 
large number of patients studied and the similarities across 
studies make the data suitable for being combined into sum¬ 
mary measures. 

Three findings show a high positive LR (LR+) across all 
studies and, when present, are most useful for identifying 
patients at increased likelihood for appendicitis: RLQ pain 
(LR+, 8.0), rigidity (LR+, 4.0), and migration of initial 
periumbilical pain to the RLQ (LR+, 3.2). Rebound tender¬ 
ness was studied in most patients, but its positive likelihood 
varied too much to allow a statistical point estimate of its 
effect (LR+, 1.1-6.3). Although the obturator sign has not 
been studied independently, the authors suspect that this 
sign has operating characteristics similar to those of the 
psoas sign. 

Clinicians also collect evidence to help prove normality. 
Unfortunately, no single component consistently provided 
a low negative LR (LR-) that would rule out appendicitis. 
There were, however, many signs that proved to be helpful 
in ruling out appendicitis. The absence of RLQ pain and 


Table 5-1 Studies of the Operating Characteristic of the Clinical Examination for Appendicitis 



Authors 

Year 

Inclusion Criteria 

Design 

No. of Patients 
Studied (% Women) 

Country 

Age Range, y 

Stan i land et al 62 

1972 

Admitted for acute abdomen 

Retrospective 

600 (49) 

United Kingdom 

<9 to >70 

Brewer et al 63 

1976 

ED evaluation for acute abdomen 

Retrospective 

1000(0) 

United States 

15 to >65 

Berry and Malt 24 

1984 

Operation for suspected appendicitis 

Retrospective 

300 (40) 

United States 

10 to >50 

Nauta and Magnant 64 

1986 

Operation for suspected appendicitis 

Prospective 

97 (40) 

United States 

2 to 91 

Alvarado 33 

1986 

Admitted for suspected appendicitis 

Retrospective 

305 (42) 

United States 

4 to 80 

Fenyo 35 

1987 

Admitted for suspected appendicitis 

Prospective 

830 (57) 

Sweden 

15 to 86 

Liddington and Thomson 65 

1991 

Admitted for abdominal pain 

Prospective 

150 (58) 

United Kingdom 

7 to 84 

Dixon et al 9 

1991 

Admitted for suspected appendicitis 

Prospective 

1204(39) 

Scotland 

7 to 87 

Izbicki et al 66 

1992 

ED evaluation for suspected appendicitis 

Prospective 

150(56) 

Germany 

11 to 88 

Eskelinen et al 67 

1994 

Admitted for abdominal pain 

Prospective 

222 (58) 

Finland 

65 to 90 

Eskelinen et al 68 

1995 

Admitted for abdominal pain 

Prospective 

417(54) 

Finland 

>50 

Total 




5275 (41) 




Abbreviation: ED, emergency department. 
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the presence of similar previous pain demonstrated power¬ 
ful LR- (0.28 and 0.25, respectively). The absence of the 
classic migration of pain also diminished the likelihood of 
appendicitis significantly (LR-, 0.5). The absence of RLQ 
guarding or rebound pain has excellent properties for rul¬ 
ing out appendicitis in some studies, but not others. The 


presence of pain before vomiting needs further study to 
identify its diagnostic efficiency because, in its only evalua¬ 
tion, it was highly efficient in ruling out appendicitis. 
Astute clinicians will recognize that the absence of 
anorexia, nausea, or vomiting has little effect on the likeli¬ 
hood of appendicitis. 


Table 5-2 Aspects of the Clinical Examination Studied 3 










Author 

Pain 

Migr 

Anorexia 

Nausea 

Vomiting 

Pain 

Similar 

Rectal 

Psoas 

RLQ Pain 

Rebound 

Rigid 

Guard 

Fever 

Staniland et al 62 


X 

X 

X 


X 



X 

X 

X 

X 


Brewer et al 63 

X 

X 


X 

X 

X 




X 



X 

Berry and Malt 24 


X 

X 




X 

X 

X 

X 

X 



Nauta and 
Magnant 64 

X 

X 

X 

X 



X 

X 

X 

X 




Alvarado 33 

X 

X 

X 




X 


X 

X 



X 

Fenyo 35 

X 









X 

X 



Liddington and 
Thomson 65 










X 




Dixon et al 9 







X 


X 

X 

X 



Izbicki et al 66 

X 





X 


X 

X 

X 

X 



Eskelinen et al 67 






X 

X 


X 

X 

X 

X 


Eskelinen et al 68 


X 

X 

X 

X 


X 


X 

X 

X 

X 

X 

No. of cases 
studied 

1354 

2161 

1691 

1684 

651 

1542 

2349 

450 

3979 

4688 

3555 

2267 

1264 


Abbreviations: Migr, migration of the initial periumbilical pain to the right lower quadrant; pain, pain before vomiting; psoas, positive psoas sign; rectal, pain on rectal examination; 
RLQ, right lower quadrant; similar, symptoms similar to those the patient previously experienced. 

“For an explanation of rebound, rigid, and guard, see the "How to Elicit the Relevant Symptoms and Signs” section of the text. 


Table 5-3 Summary of Clinical Examination Operating Characteristics for Appendicitis 3 


Procedure 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 


Right lower quadrant pain 

0.84 

0.90 

7.3-8.5 b 

0-0.28 b 


Rigidity 

0.20 

0.89 

3.8 (3.0-4.8) 

0.82 (0.79-0.85) 


Migration of pain 

0.64 

0.82 

3.2 (2.4-4.2) 

0.50 (0.42-0.59) 


Pain before vomiting 3 

1.0 

0.64 

2.8(1.9-3.9) 

NA 


Psoas sign 

0.16 

0.95 

2.4 (1.2-4.7) 

0.90 (0.83-0.98) 


Fever 

0.67 

0.79 

1.9(1.6-2.3) 

0.58(0.51-0.67) 


Rebound tenderness test 

0.63 

0.69 

1.1 -6.3 b 

0-0.86 b 


Guarding 

0.73 

0.52 

1.7-1.8 b 

0-0.54 b 


No similar pain previously 

0.86 

0.40 

1.50 (1.46-1.7) 

0.32 (0.25-0.42) 


Rectal tenderness 

0.41 

0.77 

0.83-5.3 b 

0.36-1.T 


Anorexia 

0.68 

0.36 

1.3 (1.2-1.4) 

0.64 (0.54-0.75) 


Nausea 

0.58 

0.37 

0.69-1,2 b 

0.70-0.84 b 


Vomiting 

0.51 

0.45 

0.92(0.82-1.0) 

1.1 (0.95-1.3) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio; NA, not available. 

“All studies were used to create 2x2 tables and then tested for homogeneity of the odds ratio with the Breslow-Day statistic. If studies were not rejected as heterogeneous by this 
statistic, P=.05, CIs were manually reviewed to exclude type II errors. Studies satisfying both criteria were combined, and LRs were calculated with the Mantel-Haenszel method. The 
95% CIs were calculated according to the method of Simel et al. 7z Only 1 study evaluated pain before vomiting. For an explanation of procedure terms, see Table 5-2 or the “How to 
Elicit the Relevant Symptoms and Signs” section of the text. 
b ln heterogeneous studies, the LRs are reported as ranges. 

“Only 1 study on this in the meta-analysis. 
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THE ROLE OF COMBINED FINDINGS 

Clinicians rarely rely on a single sign or symptom for diagno¬ 
sis but instead rely on a combination of findings. Unfortu¬ 
nately, the precision and accuracy of combinations of 
findings have not been reported in these studies. Several 
studies do assess, however, various decision rules that do 
combine these findings. 6,33 ' 35,66,73 ' 77 Four of the most powerful 
rules were validated on an independent set of 1254 patients 
older than 50 years and presenting with abdominal pain. No 
single score was found to be superior; however, it was 
observed that the decision rules reported in the original work 
to be most powerful incorporated at least 2 of 5 common 
variables: site and duration of pain, site of tenderness, 
rebound tenderness, and leukocytosis. 78 

THE BOTTOM LINE 

Returning to the beginning clinical scenario, the historical 
components of the presentation are highly suggestive of 
appendicitis. Our patient demonstrates the classic sequence 
of abdominal pain before vomiting, culminating with the 
migration of the initial midepigastric pain to the RLQ. The 
combination of these LR+s alone makes appendicitis more 
likely. 

The findings of guarding but not rigidity tend to neutralize 
each other’s effect. The rectal examination results and the 
psoas and related signs are helpful if present but are not help¬ 
ful when absent, as in this case. In sum, we suspect appendi¬ 
citis in this man, so further evaluation is warranted. 

A surgical doctrine suggests that a decrease in the perfora¬ 
tion rate will be achieved only by an increase in the negative 
laparotomy result rate in suspected acute appendicitis. The 
truth of this doctrine has been called into question, given the 
results of large- and small-area variation studies. 29 Improved 
clinical evaluation is suggested as a remedy for a high rate of 
negative laparotomy results without increasing the perfora¬ 
tion rate. Evidence suggests the essential nature of clinical 
details. 79,80 Clinicians often do not collect enough clinical 
details for accurate and precise diagnosis. 81 ' 83 Correction of 
this deficit, therefore, may well increase diagnostic accuracy 
without increasing the perforation rate. 

In summary, there are several conclusions that can be 
made concerning the clinical presentation, pathophysiology, 
and diagnosis of appendicitis: 

1. Appendicitis is a common clinical entity, with significant 
morbidity and mortality, particularly at the extremes of age. 

2. The pathophysiology of appendicitis consists of initial 
dilatation of the appendix, followed by appendiceal 
ischemia, necrosis, and parietal peritoneal irritation. Clin¬ 
ical findings are predictable, predicated on knowledge of 
this pathophysiology. 

3. The characteristic sequence of symptoms and signs 
includes the following: (1) vague pain initially located in 
the epigastric or periumbilical region; (2) anorexia, nau¬ 
sea, or unsustained vomiting; (3) migration of the initial 
pain to the RLQ; and (4) low-grade fever. 


4. Migration of pain in the characteristic manner, RLQ pain, 
and the presence of pain before vomiting are historical 
findings that suggest appendicitis. The presence of rigid¬ 
ity, a positive psoas sign, fever, or rebound tenderness is a 
sign on physical examination indicating an increased like¬ 
lihood of appendicitis. 

5. Conversely, the absence of RLQ pain, the absence of the 
classic migration of pain, and the presence of similar pain 
previously are powerful symptoms in the medical history 
that make appendicitis less likely. In the physical examina¬ 
tion, the lack of RLQ pain, rigidity, or guarding makes 
appendicitis less likely. 

6. Because no finding on the clinical examination can effec¬ 
tively rule out appendicitis, prudence dictates close fol¬ 
low-up of patients with abdominal pain who do not 
receive further diagnostic testing. 
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UPDATE: Appendicitis, Adult 



Prepared by Jim Wagner, MD 
Reviewed by Kaveh Shojania, MD 


CLINICAL SCENARIO 


A 24-year-old woman presents with abdominal pain, nau¬ 
sea, and vomiting. She describes the pain as beginning in 
her midabdomen 3 days ago, and it has gotten progressively 
worse. Her last menstrual period was 3 weeks ago and was 
normal; she is not sexually active. The pain has stayed in the 
midabdomen and not moved to other locations. On exami¬ 
nation, she has a fever and right lower quadrant (RLQ) and 
rebound tenderness; her pelvic and rectal examination 
results are unremarkable. Laboratory evaluation reveals a 
left shift without leukocytosis and ketonuria. 

UPDATED SUMMARY ON ADULT APPENDICITIS 

Original Review 

Wagner JM, McKinney WP, Carpenter JL. Does this patient 
have appendicitis? JAMA. 1996;276(19):1589-1594. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search for the Rational 
Clinical Examination series, combined with the subject head¬ 
ings “exp appendicitis” published between 1994 and Septem¬ 
ber 2004. This search yielded more than 400 titles, which 
were narrowed down to approximately 50 by excluding stud¬ 
ies of laboratory and radiologic tests and case studies. 

There have been few new studies that focused on the 
operating characteristics of individual components of the 
clinical examination for appendicitis. However, there have 
been several studies that have looked at combinations of 
findings. That is, instead of examining the likelihood ratio 
(LR) of rebound tenderness alone, studies have explored 
the combination of fever, migration of pain, and rebound 
tenderness. 

The studies of clinical decision rules were selected if the 
components, derivation, and validation of the prediction rule 
were clearly defined in the article and the patients included 
were those from a general population with abdominal pain or 
were suspected of having appendicitis. Our previous litera¬ 
ture search was reviewed, and studies conducted before 1994 
were included if they fit these criteria. 


NEW FINDINGS 

• Combinations of findings from the clinical examination 
are more powerful than any single finding. 

• Most of the decision rules formed by these combinations of 
findings include migration of pain from periumbilical to 
RLQ, rebound tenderness, RLQ tenderness, nausea-vomit¬ 
ing, male sex, fever, rigidity, and white blood cell (WBC) 
count. 

Details of the Update 

Eighteen studies that derived or validated clinical decision 
rules for appendicitis were identified. The most important 
studies were those by Alvarado, 1 Eskelinen et al, 2 and Fenyo et 
al. 3 These studies were chosen because of their methodology, 
large sample sizes, simplicity of the decision rule, or familiar¬ 
ity with physicians. In addition, a study that compared sev¬ 
eral clinical decision rules on the same population provided a 
good perspective of the relative value of these rules. 4 

The Alvarado 1 study was one of the first of the clinical deci¬ 
sion rules published, demonstrating the power of the rule 
beyond individual findings. Although the methods are rudi¬ 
mentary and the rule is not validated in the study, it repre¬ 
sents the most widely accepted and the simplest of the clinical 
decision rules. By combining the results for 8 findings from 
the medical history or the examination (which conveniently 
spells out the mnemonic MANTRELS), the resulting score 
provides guidance on whether to operate in the setting of sus¬ 
pected appendicitis. Of 10 potential points, patients with a 
score of 7 or higher are recommended for surgical interven¬ 
tion. The various components are Migration, Anorexia-ace¬ 
tone, Nausea-vomiting, Tenderness in RLQ, Rebound pain, 
Elevation of temperature, Leukocytosis, and Shift to the left 
of normal WBC count. 

The Eskelinen et al 2 study evaluated more than a thousand 
patients with a rule that includes 7 variables in men and 5 in 
women. The disadvantages of this study are that the rule is 
complex, computer based, and was validated with a small 
number of patients. 

The Fenyo et al 3 study assessed 10 variables used in a com¬ 
plex equation. The results for the individual findings showed 
that a WBC count of less than 8.9 x 10 9 /L (LR, 0.16) was the 
one finding that had reasonable measurement properties, 
leading to a lower likelihood of appendicitis. 
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The Ohmann et al 4 study displayed a parallel analysis of 10 
available studies, including the 3 mentioned above. A database 
of 45 variables prospectively collected from 1254 consecutive 
patients on a standardized form was used to evaluate these 
studies. A surprising outcome of the study was that none of 
the rules produced sufficiently low rates (<15%) for either 
unneeded appendectomy result (rule advised surgery but nor¬ 
mal appendix found) or delayed appendectomy (rule advised 
delay but the patient proved to have appendicitis). However, 
the clinicians in these studies did not perform much better 
than the rules. Although the clinicians who chose not to use 
the rules performed similarly to the decision rule results, 
implementing decision models in actual clinical practice may 
identify a subset of clinicians who improve with the rules. The 
authors recommend the Alvarado 1 and Eskelinen et al 2 studies 
as those warranting further evaluation. 


Table 5-4 Accuracy of Selected Decision Rules 


Author 

Year 

n 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 

(95% Cl) 

Alvarado 1 

1986 

277 

3.1 

(1.9-5.0) 

0.26 

(0.19-0.35) 

12 

(6.0-25) 

Christian and 
Christian 9 

1992 

58 

4.5 

(1.8-11) 

0 

(0-0.23) 

311 

(15-6426) 

Fenyo et al 3 

1997 

1167 

5.6 

(4.6-6.8) 

0.31 

(0.26-0.37) 

18 

(13-24) 

Eskelinen et al 2 

1994 

1333 

10 

(8.3-12) 

0.06 

(0.04-0.10) 

164 

(93-287) 

Izbicki et al 10 

1992 

150 

1.9 

(1.5-2.5) 

0.21 

(0.09-0.45) 

9.5 

(3.7-24) 

Kalan et al 11 

1994 

49 

1.3 

(0.81-2.1) 

0.38 

(0.11-1.3) 

3.5 

(0.66-19) 

Ramirez and 
Deus 12 

1994 

166 

4.3 

(2.1-8.8) 

0.25 

(0.17-0.36) 

17 

(6.4-46) 

Saidi and 
Ghasemi 13 

2000 

128 

6.0 

(3.6-9.9) 

0.08 

(0.03-0.24) 

75 

(20-280) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 


Table 5-5 The Alvarado Clinical Decision Rule (MANTRELS Mnemonic) 

Variable 

Score 

Migration 

1 

Anorexia-acetone 

1 

Nausea-vomiting 

1 

Tenderness in RLQ 

2 

Rebound pain 

1 

Elevation of temperature 

1 

Leukocytosis 

2 

Shift to the left 

1 

Maximum total score 

10 

Positive 

>7 


Abbreviation: RLQ, right lower quadrant. 


What lessons can be learned from these studies? The rules 
recommended by experts incorporate a description of the 
pain location (and change of location) from the medical his¬ 
tory, rebound and RLQ tenderness on the physical examina¬ 
tion, and leukocytosis. Other commonly included variables 
are nausea, vomiting, male sex, fever, and rigidity. Decision 
rules do not vary dramatically between women and men 
with abdominal pain. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A series of letters to the editor prompted by the original pub¬ 
lication pointed out some ways the presentation of the data 
could be improved. 5 - 6 The numbers reported for the sensitivi¬ 
ties and specificities did not match the reported LRs; this 
error was explained 7 and corrected in Table 5-3 of the origi¬ 
nal publication. 

CHANGES IN THE REFERENCE STANDARD 

There were no changes in the reference standard; appendici¬ 
tis is still a histologically proven diagnosis, and the absence of 
appendicitis is still a clinical diagnosis (ie, no surgery after 
adequate follow-up). A recent systematic review suggests that 
computed tomography may be more accurate than ultra¬ 
sonography for identifying patients with appendicitis, but 
neither test is sufficient to serve as the reference standard. 8 


RESULTS OF LITERATURE REVIEW 

The LR and diagnostic odds ratio of the 8 best clinical deci¬ 
sion rules are displayed in . The studies with the 

highest numbers of participants that evaluated rules with 
the highest diagnostic odds ratios (a measure of overall 
accuracy) were Alvarado, 1 Fenyo et al, 3 and Eskelinen et al. 2 

Although the approaches of Fenyo et al 3 and Eskelinen et 
al 2 have a higher diagnostic odds ratio than the Alvarado 1 
rule, both have a large number of variables and require mul¬ 
tivariate modeling that make them hard to use without a 
handheld calculator or coding sheet. According to these 
results, as well as expert opinion expressed by the parallel 
evaluation of Ohmann et al 4 of 10 of the studies, the 
Alvarado 1 clinical prediction rule ( ale 5 ) is used by most 
clinicians who prefer decision rules because it balances accu¬ 
racy with simplicity of use and familiarity to clinicians. 


CLINICAL SCENARIO—RESOLUTION 


The patient’s presentation is suggestive but not clearly 
diagnostic of appendicitis. The clinician resorts to Alva¬ 
rado’s 1 clinical decision rule and calculates that the patient 
has 7 of 10 possible points, a positive test result with an 
LR of 3.1. The patient was referred for surgery, which 
revealed an inflamed appendix. 





















































CHAPTER 5 Appendicitis, Adult 


APPENDICITIS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The incidence of appendicitis among emergency patients 
with abdominal pain is up to 25% for patients younger 
than 60 years. For those older than 60 years, the incidence 
is up to 5%. 

POPULATIONS FOR WHOM APPENDICITIS 
SHOULD BE CONSIDERED 

• AH patients with abdominal pain. 

DETECTING THE LIKELIHOOD OF APPENDICITIS 
AMONG EMERGENCY PATIENTS WITH 
ABDOMINAL PAIN IN THE RLQ 

The Alvarado 1 model is recommended as the most user- 
friendly while being among the most powerful. The details 
of the clinical decision rule are displayed in Table 5-6. Note 
that the MANTRELS mnemonic is helpful in that it is easy 
to remember and is organized according to medical history, 
physical examination, and laboratory data. 


Table 5-6 Operating Characteristics of the Alvarado Model 

Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Alvarado score 
(>7 is positive) 

0.81 

0.74 

3.1 (1.9-5.0) 

0.26(0.19-0.35) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


REFERENCE STANDARD 

Histologically proven diagnosis or no surgery after adequate 
follow-up (which allows the inference that appendicitis was 
not present). 
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EVIDENCE TO SUPPORT THE UPDATE 


Appendicitis, Adult 



TITLE A Practical Score for the Early Diagnosis of Acute 
Appendicitis. 

AUTHOR Alvarado A. 

CITATION AnnEmergMed. 1986;15(5):557-564. 

QUESTION Can the negative appendectomy rate be 
reduced without increasing the risk of perforation by 
using a practical score? 

DESIGN Retrospective chart review to derive a decision 
rule based on bayesian statistics. 

SETTING One Philadelphia hospital. 

PATIENTS Three hundred five patients hospitalized 
from January 1975 to December 1976 with abdominal 
pain suggestive of appendicitis. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Two-by-two tables were constructed for each clinical charac¬ 
teristic found with the chart review. The 8 most accurate 
characteristics were used in the clinical decision rule ( 

7), making it one of the simplest rules available. 

MAIN OUTCOME MEASURES 

Appendicitis was diagnosed when pathologically proven. No 
appendicitis was defined as a normal appendicitis discovered at 
operation or resolution of pain without surgery. The length of 
follow-up of the nonsurgical patients was not defined. 

MAIN RESULTS 

See Table 5-8. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Although the level of evidence of this study is 
low, this study is noteworthy because of its early appearance 
in the literature and its wide acceptance. 


Table 5-7 The Alvarado Scoring System (MANTRELS Mnemonic) 


Variable Score 3 

Migration 1 

Anorexia-acetone 1 

Nausea-vomiting 1 

Tenderness in RLQ 2 

Rebound pain 1 

Elevation of temperature 1 

Leukocytosis 2 

Shift to the left 1 

Tbtal TcF 


Abbreviation: RLQ, right lower quadrant. 
a A score of 7 or higher requires operation. 


Table 5-8 Likelihood Ratios for the Alvarado Score 





LR+ 

LR- 

DOR 

Test 

Sensitivity 

Specificity 

(95% Cl) 

(95% Cl) 

(95% Cl) 

Alvarado 
score (>7) 

0.81 

0.74 

3.1 

(2.2-5.0) 

0.25 

(0.21-0.35) 

12 

(6.0-25) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 


WEAKNESSES This study’s retrospective design is a weak¬ 
ness, but it has much face validity. This study has been vali¬ 
dated in several later studies. 1 ' 3 

REFERENCES FOR THE EVIDENCE 

1. Saidi HS, Chavda SK. Use of a modified Alvarado score in the diagnosis 
of acute appendicitis. East Afr Med J. 2003;80(8):411-414. 

2. Saidi RF, Ghasemi M. Role of Alvarado score in diagnosis and treatment 
of suspected acute appendicitis. Am J EmergMed. 2000;18(2):230-231. 

3. Kalan M, Talbot D, Cunliffe WJ, Rich AJ. Evaluation of the modified 
Alvarado score in the diagnosis of acute appendicitis: a prospective study. 
Aim R Coll Surg Engl. 1994;76(6):418-419. 

Reviewed by James Wagner, MD 
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TITLE Sex-Specific Diagnostic Scores for Acute Appen¬ 
dicitis. 

AUTHORS Eskelinen M, Ikonen J, Lipponen P. 

CITATION Scand J Gastroenterol. 1994;29(l):59-66. 

QUESTION Can the diagnosis of acute appendicitis in 
women and men with acute abdominal pain be improved 
by using computer-based diagnostic scores? 

DESIGN This was prospective derivation of a clinical 
decision rule from a convenience sample of patients with 
a standardized data collection sheet. The rule was derived 
using logistic stepwise multivariate regression analysis. 

SETTING Two Finnish hospitals. 

PATIENTS A total of 1333 patients with acute abdomi¬ 
nal pain of less than 7 days’ duration who were admitted 
to one of the 2 study hospitals during a 6-year period. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Separate clinical decision rules were derived for men and 
women. The scoring system for the clinical decision rule for 
the men involved 7 indicators ( He 5-9); the scoring system 
for women involved 5 ( able 5-1 ). Computers were used in 
this study to take the data entered from a standardized form 
and calculate the discriminate score (DS) and compare it 
with the diagnostic standard. 

MAIN OUTCOME MEASURES 

The DS was compared with a diagnostic standard: appen¬ 
dicitis was defined as that pathologically proven. No 
appendicitis was defined as a normal appendicitis discov¬ 
ered at operation or resolution of pain without surgery. 
The length of follow-up of the nonsurgical patients was 
not defined. 


MAIN RESULTS 

Several computer models and cutoffs were analyzed; the cut¬ 
off for the results reported in able 5 was as follows. 
Patients with DS values below -2.00 should not have surgery, 


patients with a DS above -0.48 should have surgery, and 
patients with DS values between -2.00 and -0.48 were con¬ 
sidered nondefined. That is, they required follow-up before 
the decision to operate or not was made. 


Table 5-9 The Scoring System tor Men 



Variable 

Indicator 

Score 

Constant 


-7.69 

Previous abdominal surgery 

Yes 

0 


No 

1.88 

Pain at diagnosis 

RLQ 

1.3 


Other 

0 

Fever 

>37.1 °C 

1.05 


<37.1 °C 

0 

Tenderness 

RLQ 

1.97 


Other 

0 

Rebound tenderness 

Yes 

1.61 


No 

0 

Guarding 

Yes 

1.14 


No 

0 

Rigidity 

Yes 

1.43 


No 

0 

Abbreviation: RLQ, right lower quadrant. 


Table 5-10 The Scoring System for Women 


Variable 

Indicator 

Score 

Constant 


-7.22 

Pain at diagnosis 

RLQ 

1.33 


Other 

0 

Tenderness 

RLQ 

2.98 


Other 

0 

Renal tenderness 

Yes 

0 


No 

0.88 

Guarding 

Yes 

2.08 


No 

0 

Rigidity 

Yes 

2.45 


No 

0 


Abbreviation: RLQ, right lower quadrant. 


Table 5-11 

Results of the Study 





Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

DOR (95% Cl) 

Men 

0.95 

0.89 

8.6 

0.05 

163 

Women 

0.93 

0.92 

12 

0.07 

163 

Total 

0.94 

0.91 

10(9.3-12) 

0.06(0.05-0.10) 

164(93-287) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
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CONCLUSIONS 

LEVEL OF EVIDENCE Levels 2 and 3. 

STRENGTHS This study had a large sample size. A stan¬ 
dardized form was used to record all clinical data. 

LIMITATIONS This study used convenience sampling of 
patients that was described by the authors as “although not 
entirely consecutive, the series was collected by the same sur¬ 
geon with regard to data collection and comprised a repre¬ 
sentative and unselected sample.” 

The clinical decision rule was not validated in the original 
report. It has since been validated on an even larger sample of 
patients 1 ; the results were less impressive but still indicated 
significant power of this scoring system. 

REFERENCE FOR THE EVIDENCE 

1. Zielke A, Sitter H, Rampp T, Bohrer T, Rothmund M. Clinical decision¬ 
making, ultrasonography, and scores for evaluation of suspected acute 
appendicitis. World J Surg. 2001;25(5):578-584. 

Reviewed by James M. Wagner, MD 


TITLE Diagnostic Decision Support in Suspected 
Acute Appendicitis: Validation of a Simplified Scoring 
System. 

AUTHORS Fenyo G, Lindberg G, Blind P, Enochsson L, 
Oberg A. 

CITATION EurJSurg. 1997;163(ll):831-838. 

QUESTION Can a scoring system for the diagnosis of 
appendicitis be validated? 

DESIGN Prospective validation of previously derived 
decision rule. 

SETTING One Swedish county district hospital and 1 
university hospital. One center accounted for 86% of the 
patients. The authors state that “virtually all” patients in 
that center were enrolled. At the second center that 
enrolled a minority of patients, only 60% of the poten¬ 
tially eligible patients were enrolled. 

PATIENTS A total of 1167 patients with suspected 
appendicitis, that is, patients who had not previously had 
an appendectomy and who presented with pain, tender¬ 
ness, or both in the right lower quadrant (RLQ). 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The scoring system validated in this study has been used rou¬ 
tinely in the 2 hospitals involved in this study. A pocket chart 
with 10 variables and their associated scores was carried by 


clinicians ( ); scores suggest “consider operation,” 

“observe with repeated examinations,” or “observation or 
discharge of the patient.” 

MAIN OUTCOME MEASURES 

The diagnostic standard was positive for histologically 
proven appendicitis and negative for histologically disproven 
appendicitis or the resolution of symptoms without opera¬ 
tion. A positive result was defined as a score of -2 or more. 

MAIN RESULTS 

See Tables 5-13, 5-14, 5-15, and 5-16. 


Table 5-12 Scoring System 



Variable 

Indicator 

Score 

Constant (apply to all patients as the 
starting point) 


-10 

Sex 

Male 

+8 


Female 

-8 

White blood cell count (per pL) 

>14000 

+15 


9000-13900 

+2 


<8900 

-15 

Duration of pain, h 

<24 

+3 


24-48 

0 


>48 

-12 

Progression of pain 

Yes 

+3 


No 

-4 

Relocation of pain 

Yes 

+7 


No 

-9 

Vomiting 

Yes 

+7 


No 

-5 

Aggravation by coughing 

Yes 

+4 


No 

-11 

Rebound tenderness 

Yes 

+5 


No 

-10 

Rigidity 

Yes 

+15 


No 

-4 

Tenderness outside RLQ 

Yes 

-6 


No 

+4 

Abbreviation: RLQ, right lower quadrant. 


Table 5-13 Probability of Appendicitis According to Score 


Probability of 
Appendicitis 

Recommended Strategy 

-2 or greater 

>0.45 

Consider operation 

-3 to-16 

0.44-0.17 

Observe with repeated examination 

-17 or less 

<0.17 

Observe or discharge to home 
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Table 5-14 Univariate Results 



LR+ (95% Cl)“ 

LR- (95% Cl) 

Male (n = 531) 

1.6 (1.4-1.7) 


Female (n = 636) 

0.66 (0.58-0.75) 


White blood cell count (per pL) 

>14000 

3.1 (2.5-3.8) 


9000-13900 

1.3 (1.1-1.5) 


<8900 

0.16(0.11-0.22) 


Duration of pain, h 

<24 

1.4 (1.2-1.5) 


24-48 

1.1 (0.78-1.5) 


>48 

0.39(0.30-0.51) 


Progression of pain 

1.3 (1.2-1.5) 

0.57 (0.47-0.68) 

Relocation of pain 

2.2 (2.0-2.6) 

0.46 (0.40-0.54) 

Vomiting 

1.7 (1.4-1.9) 

0.74 (0.66-0.82) 

Aggravation by coughing 

1.5 (1.4-1.6) 

0.35 (0.28-0.45) 

Rebound tenderness 

1.8 (1.6-1.9) 

0.38(0.31-0.46) 

Rigidity 

2.8 (2.3-3.5) 

0.70 (0.64-0.76) 

Tenderness outside RLQ 

0.67 (0.58-0.77) 

1.4 (1.3-1.6) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; RLQ, right lower quadrant. 

“Serial likelihood ratios are reported for results on an ordinal scale. 


The overall accuracy from the receiver operating charac¬ 
teristic curve for the multivariate model was 0.89 for the cen¬ 
ter with consecutive enrollment vs 0.83 for the center that did 
not capture all eligible patients. 

Using the model with a cut point of -2 or greater, as pre¬ 
sented by the authors, produces a likelihood ratio (LR) of 5.6 
(95% confidence interval [Cl], 4.6-6.8) for a score of -2 or 
greater; when the score is less than -2, the LR is 0.31 (95% 
Cl, 0.26-0.37). 


Table 5-16 Serial LRs for the Recommended Cut Points 


Score 

LR (95% Cl) for Appendicitis 

-2 or greater 

6.0 (4.9-7.2) 

-3 to-16 

0.59 (0.42-0.84) 

-17 or less 

0.26 (0.21-0.32) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


recorded. The study reports a clinical decision rule that was 
derived and validated with good technique. 

LIMITATIONS This study reported the results from 2 cen¬ 
ters. Most patients enrolled in the study were reported as 
being “consecutive.” It is difficult to assess the effect of non- 
consecutive enrollment at the second hospital, which 
accounted for approximately 15% of the patients. However, 
because the overall accuracy of the score at the hospital with 
nonconsecutive patients was slightly worse (83% vs 89%), it 
is likely that the findings underestimate the true accuracy. 

None of the individual clinical findings had values dis¬ 
tinctly different from 1, allowing the clinician the opportu¬ 
nity to reliably rule in appendicitis. A variable with good 
measurement properties that decreased the likelihood of 
appendicitis was a normal white blood cell count (<8900/ 
pL), with an LR of 0.16. 

The investigators’ goal was to compare the predicted prob¬ 
ability of a score by using the clinical variables with the actual 
outcomes. The authors recommend a cut point of -2 or 
greater as suggesting the need for surgery and a value -17 or 
less as appropriate for discharging a patient home without 
observation and a repeated examination. The data are pre¬ 
sented in a fashion that allows clinicians to calculate the LR 
for the 3 levels of appendicitis scores. The serial LRs perform 
better than the dichotomous LR and match the clinical rec¬ 
ommendations for the different levels of LRs. 


CONCLUSIONS 


Reviewed by James M. Wagner, MD 


LEVEL OF EVIDENCE Level 3. 

STRENGTHS This study had a large sample size, and it 
used a standardized form on which all clinical data were 


Table 5-15 Multivariate Results for a Score of 
Function of Sex 

-2 or Greater as a 

Test Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

Men 0.80 

0.79 

3.8 (3.1-4.8) 

0.24 

(0.18-0.31) 

16(10-24) 

Women 0.61 

0.93 

8.8(6.2-12) 

0.42 

(0.34-0.51) 

21 (13-34) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 
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TITLE Diagnostic Scores for Acute Appendicitis. 

AUTHORS Ohmann C, Yang O, Franke C, and the 
Abdominal Pain Study Group. 

CITATION EurJSurg. 1995;161(4):273-281. 

QUESTION Which of 10 predictive scores used in the 
diagnosis of acute appendicitis are most valuable? 

DESIGN Multicenter prospective collection of 25 vari¬ 
ables from the medical history and 20 from the physical 
examination. 

SETTING Six German hospitals. 

PATIENTS A total of 1254 consecutive patients with 
acute abdominal pain of less than 1 week’s duration, 
excluding patients with trauma or postsurgical pain. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The 10 tests evaluated were the Lindberg et al, 1 Eskelinen et al, 2 
Alvarado, 3 Fenyo, 4 Izbicki et al, 5 Christian and Christian, 6 van 
Way et al, 7 Teicher et al, 8 Arnbjornsson, 9 and De Dombal 10 
scores. These scores were grouped according to the population 
in which the score was intended for use ( le 5-17). Group A 
(Lindberg and Eskelinen scores) contained scores intended for 
use with a population with acute abdominal pain. Group B 
(Alvarado, Fenyo, Izbicki, and Christian scores) scores were 
intended for use on patients suspected of having appendicitis. 
Group C (van Way, Teicher, and Arnbjornsson scores) scores 
were derived from patients who had appendicitis. Group D (De 
Dombal) were scores intended for use with any patient with 
abdominal pain, but the diagnosis of interest was “nonspecific 
abdominal pain”; that is, instead of diagnosing appendicitis, it 
“diagnoses” pain in which surgical intervention is unnecessary. 


Appendicitis was diagnosed when confirmed by pathology 
specimens. “No appendicitis” was defined as a normal appen¬ 
dix discovered at operation or resolution of pain without sur¬ 
gery. Patients not receiving operation were followed by 
telephone interview for a length of time that was undefined. 

MAIN OUTCOME MEASURES 

The collected data were used to calculate the 10 predictive 
scores. Patients were retrospectively assigned to outcomes 
that would have resulted from management that followed the 
score’s suggestion. A 15% negative appendectomy rate is 
accepted as standard of care, so a score that resulted in 
assignments of patients leading to more than 15% was 
deemed unacceptable. This was done to define a minimally 
acceptable performance of a score. 

There were 4 such criteria used for comparing outcomes from 
the 10 scores: (1) “ [i]nitial negative appendicectomy [sic] rate” 
(defined as proportion of patients who did not have acute appen¬ 
dicitis who were assigned to the operation group), (2) “[p]oten- 
tial perforation rate” (defined as proportion of patients 
with acute appendicitis not assigned to the operation 
group), (3) “[i]nitial missed perforation rate” (defined as pro¬ 
portion of patients with perforated appendicitis not assigned to 
the operation group), and (4) “[m]issed appendicitis rate” 
(defined as the proportion of patients with acute appendicitis 
who were assigned to the exclusion group). 

MAIN RESULTS 

The prevalence of appendicitis in this study was 17%. 

None of the tested scores fulfilled the criterion for an 
acceptable score since all had high missed appendicitis rates 
(Table 5-17). By calculation of sensitivity and specificity from 
the data provided in the study, it appears that there was a 


Table 5-17 Clinical Outcomes That Would Have Accrued From Management Guided by the Scores 


Group 

Score 

Initial Negative 
Appendectomy Rate, % 

Potential Perforation Rate, % 

Initial Missed Perforation Rate, % 

Missed Appendicitis Rate, % 

A 

Lindberg 1 

53 

53 

56 

30 


Eskelinen 2 

30 

30 

63 

61 

B 

Alvarado 3 

29 

29 

76 

65 


Fenyo 4 

25 

25 

76 

57 


Izbicki 5 

47 

49 

38 

17 


Christian 6 

42 

42 

69 

57 

C 

van Way 7 

12 

12 

27 

15 


Teicher 8 

40 

11 

21 

a 


Arnbjornsson 9 

13 

13 

10 

15 

D 

De Dombal 10 

39 

39 

82 

76 

Standard 


15 

35 

15 

5 

Clinicians 

Initial 

52 

11 

16 

11 


Actual 

8 

33 

44 

0 


“Ellipsis indicates data not available. 
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deflation of the sensitivity and inflation of the specificity in 
this study compared with the other attempts at validating 
these data, which suggests a referral bias in this or the ana¬ 
lyzed studies. The initial diagnostic accuracy of the clinicians 
also did not perform at a minimally acceptable level. 

Despite the disappointing performance of the scores, the 
investigators reported the performance of each score com¬ 
pared with one another. After applying each score to the 
entire database of patients presenting with abdominal pain 
(not just the populations for which the scores were intended 
or derived), the investigators recommended further testing of 
2 scores in patients with abdominal pain or suspected of hav¬ 
ing appendicitis: the Alvarado and Eskelinen scores. The 
investigators also report the variables used most frequently: 
site and duration of pain, site of tenderness, rebound tender¬ 
ness, and white blood cell count. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS This was an unprecedented, prospective, 
multicenter study that compared 10 clinical prediction rules 
for appendicitis on a single, large population at several Ger¬ 
man hospitals. The methods were fairly well described, and 
the criteria against which all rules were compared seemed 
thoughtful. 

WEAKNESSES The clinical database used in this study 
contained most, but not all, of the clinical criteria used in 
each clinical prediction rule. This was a complex study, and 
its description and tables were somewhat confusing. The 
division of studies into groups A and B was based on subjec¬ 
tive data and seemed arbitrary. The included studies typically 
did not explain the difference between “acute abdominal 
pain” and “suspected appendicitis.” The data reported most 
thoroughly were those analyzed by groups; perhaps the most 
useful data presented were in the text, where groups A and B 
were compared on the same population. 

The performance of each of the rules was surprising. The 
investigators provide several suggestions to explain the poor 
performance, mainly positive bias of the original studies and 
geographic differences in patient characteristics. Beyond 
what was explored in the discussion, the difference between 
the initial and actual treatment plan may explain the poor 
performance of the scores. Given time, the patient may lose 
symptoms or signs and therefore exhibit a lower score than 
initially recorded. 

Nonetheless, it appears that the Alvarado and Eskelinen 
scores are the best clinical decision rules for appendicitis in 
patients with abdominal pain. This judgment is based on the 
practicality of the score and the use of the most powerful 
individual findings. In addition, the Alvarado rule is the old¬ 
est rule most familiar to clinicians and is the simplest to 
implement. 


REFERENCES FOR THE EVIDENCE 

1. Fenyo G, Lindberg G, Blind P, Enochsson L, Oberg A. Diagnostic deci¬ 
sion support in suspected acute appendicitis: validation of a simplified 
scoring system. Eur JSurg. 1997; 163(11 ):831-838. 

2. Eskelinen M, Ikonen J, Lipponen P. A computer-based diagnostic score 
to aid in diagnosis of acute appendicitis: a prospective study of 1333 
patients with acute abdominal pain. Theor Surg. 1992;7:86-90. 

3. Alvarado A. A practical score for the early diagnosis of acute appendici¬ 
tis. Ann EmergMed. 1986;15(5):557-564. 

4. Fenyo G. Routine use of a scoring system for decision-making in sus¬ 
pected acute appendicitis in adults. Acta Chir Scand. 1987;153(9):545- 
551. 

5. Izbicki JR, Wolfram TK, Dietmar KW, et al. Accurate diagnosis of acute 
appendicitis: a retrospective and prospective analysis of 686 patients. Eur 
JSurg. 1992; 158(4) :227-231. 

6. Christian F, Christian GP. A simple scoring system to reduce the negative 
appendicectomy rate. Ann R Coll Surg Engl. 1992;74(4):281-285. 

7. van Way C, Murphy J, Dunn E, Elerding S. A feasibility study of com¬ 
puter aided diagnosis in appendicitis. Surg Gynecol Obstet. 1982;155(5): 
685-688. 

8. Teicher I, Landa B, Cohen M, Kabnick LS, Wise L. Scoring system to aid 
in the diagnosis of appendicitis. Ann Surg. 1983;198(6):753-759. 

9. Arnbjornsson E. Scoring system for computer-aided diagnosis of acute 
appendicitis. Ann Chir Gynaecol. 1985;74(4):159-166. 

10. De Dombal FT. Diagnosis of Acute Abdominal Pain. 2nd ed. New York, 
NY: Churchill Livingstone; 1991:1-259. 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have 

Ascites? 

How to Divine Fluid in the Abdomen 

John W. Williams, Jr, MD 
David L. Simel, MD, MHS 


In each of the following cases, the clinician will need to 
determine whether the patient has ascites. 

CASE 1 A 44-year-old man with cirrhosis is admitted 
with fever but has no obvious source of infection. 

CASE 2 A 57-year-old woman presents with an adnexal 
mass and recent weight gain but otherwise feels well. 

CASE 3 A 65-year-old man with a history of myocardial 
infarction is admitted for decreased exercise tolerance, 
increased abdominal girth, and ankle edema. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Free fluid in the abdominal cavity is ascites. Ascites may have 
important diagnostic, prognostic, and therapeutic implica¬ 
tions. When clinically detectable, ascites may indicate under¬ 
lying heart failure, liver disease, nephrotic syndrome, or 
malignancy. In patients with liver disease, ascites has prog¬ 
nostic significance because operative mortality is increased 
and overall survival is decreased; ascites may also signal 
metastases in patients with malignancy. 1 Although patients 
with small amounts of ascites do not generally require spe¬ 
cific therapy, patients with larger amounts of ascites may 
require intervention to relieve symptoms caused by their dis¬ 
tended abdomen. Furthermore, the degree of ascites is useful 
in monitoring the efficacy of treatment for the underlying 
condition that caused it (eg, monitoring response to chemo¬ 
therapy for malignancy). 

The 3 clinical scenarios are specific examples of why ascites 
detection is clinically important. For example, ascites detec¬ 
tion in the first patient may lead to the diagnosis of sponta¬ 
neous bacterial peritonitis as the source of the patient’s fever. 
If ascites is found by clinical examination, the physician may 
be able to proceed directly to abdominal paracentesis with¬ 
out pausing for imaging procedures. In the second patient, 
the presence of ascites would heighten the clinician’s suspi¬ 
cion of ovarian carcinoma with peritoneal metastases, imply¬ 
ing a more advanced stage and poorer prognosis. In the third 
patient, the finding of ascites may trigger the physician’s con¬ 
sideration of diagnostic possibilities other than severe left- 
sided congestive heart failure, such as a pericardial effusion 
causing marked signs of right-sided heart failure. Clearly, 
clinical determination of the presence or absence of ascites 
has the advantages of speed, convenience, and cost savings on 
diagnostic imaging. 

It is easy to identify large volumes of ascites clinically, but 
smaller amounts of ascites are not as obvious. When diag¬ 
nostic confirmation is necessary, paracentesis is the definitive 
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test, although less invasive radiographic procedures are ordi¬ 
narily used to corroborate the clinician’s suspicion. Ultra¬ 
sonography can detect as little as 100 mL of abdominal fluid 
and is considered the gold standard for diagnosing ascites. 2,3 
Abdominal computed tomography can also detect small 
amounts of fluid but is more expensive. Unfortunately, there 
are no general guidelines for correlating small amounts of 
ascites observed on ultrasonographic examination or com¬ 
puted tomography with pathophysiologic conditions. 

The reference standard for ascites is fluid aspiration by 
paracentesis and fluid visualization by ultrasonography or 
computed tomography. 

Pathophysiology of Ascites 

An understanding of the pathophysiologic basis for ascites 
facilitates assessment of each patient’s risk by alerting the 
examiner to conditions disrupting normal physiology 
(Table 6-1). Under physiologic conditions, intravascular 
and extravascular hydrostatic and colloid osmotic pres¬ 
sures are balanced, preventing accumulation of extravas¬ 
cular fluid. 5 Any process disrupting this balance may 
precipitate ascites. For example, fibrotic constriction of 
the hepatic sinusoids secondary to alcoholic cirrhosis 
leads to increased venous hydrostatic pressure and, ulti¬ 
mately, to ascites by forcing lymphatic drainage into the 
abdomen through the hepatic capsule. 1 Cirrhotic patients 
with ascites show avid renal retention of sodium and 
water, which is an important mechanism for continued 
ascites formation. 6 A second, less important mechanism 
for ascites formation is a loss of osmotic pressure because 
of inadequate protein synthesis (eg, malnutrition, liver 
disease) or protein wasting (eg, the nephrotic syndrome). 
Because of protein loss, transudative fluid moves from the 


Table 6-1 Pathophysiologic Classification of Ascites 3 

I. Elevated hydrostatic pressure 

A. Cirrhosis 

B. Congestive heart failure 

C. Constrictive pericarditis 

D. Inferior vena cava obstruction 

E. Hepatic vein obstruction (Budd-Chiari syndrome) 

II. Decreased osmotic pressure 

A. Nephrotic syndrome 

B. Protein-losing enteropathy 

C. Malnutrition 

D. Cirrhosis or hepatic insufficiency 

III. Fluid production exceeding resorptive capacity 

A. Infections 

1. Bacterial 

2. Tuberculosis 

3. Parasitic 

B. Neoplasms 
“Adapted from Bender . 4 


intravascular space into the abdominal extravascular 
space to balance hydrostatic and osmotic forces. Finally, 
infection or malignancy in the peritoneum may produce 
inflammatory exudates or malignant effusions in the 
abdominal extravascular space faster than it can be 
absorbed intravascularly. 

How to Elicit the Symptoms and Signs of Ascites 

A complete evaluation for ascites includes a focused medi¬ 
cal history and physical examination. The examiner 
should ask about recent ankle edema, weight gain, or 
change in abdominal girth. Other potentially important 
items are a history of liver disease or congestive heart fail¬ 
ure. A focused physical examination for ascites includes 
(1) inspection for bulging flanks, (2) percussion for flank 
dullness, (3) a test for shifting dullness, and (4) a test for a 
fluid wave. 

Bulging flanks occur when the weight of abdominal free 
fluid is sufficient to push the flanks outward. However, it 
is sometimes difficult to distinguish bulging flanks caused 
by ascites from bulging flanks caused by obesity. One 
method for discriminating between the 2 is to test for 
flank dullness. With the patient recumbent, gas-filled 
loops of bowel will characteristically float on top of asci¬ 
tes, making the percussion note tympanitic at the umbili¬ 
cus and dull beyond the fluid meniscus into the flanks 
(Figure 6-1A). The examiner can confirm this pattern by 
progressively percussing the abdomen, beginning at the 
umbilicus and moving toward the flanks, listening for the 
transition from tympany to dullness when the meniscus is 
reached. 7 Having identified and marked the transition 
between tympany and dullness, further evidence for asci¬ 
tes can be obtained by testing for shifting dullness. This is 
done by rolling the patient away from the examiner and 
repeating the percussion. With ascites, the area of dullness 
shifts to the dependent side, and the area of tympany 
shifts toward the top (Figure 6- IB). 

Another potentially useful method for detecting ascites is 
testing for a fluid wave. The test is performed by having the 
patient, or an assistant, place the medial edges of both hands 
firmly down the midline of the abdomen to block transmis¬ 
sion of a wave through subcutaneous fat (Figure 6-2). The 
examiner taps one flank sharply while using the fingertips to 
feel for an impulse on the opposite flank. When ascites is 
present, an impulse may be felt in the receiving hand after a 
barely perceptible lag. 

Two additional maneuvers, the puddle sign and auscul¬ 
tatory percussion, cannot currently be recommended. The 
puddle sign was initially advocated because of its pur¬ 
ported high sensitivity. 8 However, it is infrequently used 
now because it is difficult to perform properly and has low 
sensitivity (43%-55%). 9,10 A method of auscultatory per¬ 
cussion was described by Guarino, 11 but its precision and 
accuracy have not yet been reported. After voiding, the 
patient sits or stands so that free fluid gravitates to the pel¬ 
vis, and the examiner places a stethoscope in the midline, 
immediately above the pubic crest. Finger-flicking percus- 























CHAPTER 6 Ascites 



sion is performed along radial spokes from the subcostal 
margin downward toward the pelvis. The percussion note 
is initially dull but changes sharply to a loud note at the 
border of increased pelvic density. In the absence of ascites, 
the border is approximately 4.5 cm above the pelvic crest 
(the pelvic baseline). In patients with ascites, free fluid 
raises the demarcating border clearly above the pelvic 
baseline. When the patient is supine, this clear line of 
demarcation is obliterated because the free fluid gravitates 
to the flanks. 

Although most of the physical examination for ascites 
should focus on the abdomen, extra-abdominal signs may 



Figure 6-2 Testing for a Fluid Wave 


provide evidence for conditions associated with ascites. Phys¬ 
ical findings that may be useful by their presence or absence 
include evidence of liver disease (eg, jaundice, spider angio¬ 
mas) or heart disease (eg, cardiac gallop). 

ACCURACY OF HISTORY AND 
SYMPTOMS FOR ASCITES 

We examined the effect of medical history items on the proba¬ 
bility of ascites in male veteran inpatients (Table 6-2). 9 Medical 
histories, obtained by internal medicine house staff, were com¬ 
pared with reference standard abdominal ultrasonographic 
findings. Positive histories of hepatitis or heart failure gener¬ 
ated likelihood ratios (LRs) of 3.2 and 2.0, respectively. How¬ 
ever, alcoholism (positive LR [LR+], 1.4) or a history of 
carcinoma (LR+, 0.91) had little effect on the odds of ascites. 

Other questions about the patient’s present illness may be 
even more useful. In this same study, the patient’s symptoms 
of increased abdominal girth, weight gain, or ankle edema 
gave LR+ values of 4.2, 3.2, and 2.8, respectively. The absence 
of increased abdominal girth (negative LR [LR-], 0.17) or 
ankle swelling (LR-, 0.10) decreased appreciably the diag¬ 
nostic likelihood of ascites. For example, in a patient with a 
low pretest probability of ascites (<20%), the absence of 
recent ankle edema decreases the probability of ascites to less 
than 2.5%. Clearly, the patient’s medical history and current 
symptoms are valuable for at least 2 reasons. First, certain 
items may suggest the presence or absence of ascites. Second, 
in patients suspected of having ascites, a focused physical 
examination for ascites is needed. The clinical history distin¬ 
guishes patients with high and low probabilities for ascites. 
Ascites is unlikely when patients report no increase in 
abdominal girth, and ascites is very unlikely in male patients 
who report no history of recent ankle swelling. 

PRECISION OF THE SIGNS FOR ASCITES 

Six gastroenterologists examined 50 hospitalized alcoholic 
patients for the presence or absence of ascites. Their overall 
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agreement was good (intraclass correlation, 0.75), and it was 
excellent among senior physicians (0.95). 12 In another study, 
90 veteran inpatients with evidence of liver disease were 
examined by 3 internists for 4 signs of ascites. For each sign, 
there was good agreement: presence or absence of abdominal 
distention (86%), bulging flanks (79%), shifting dullness 
(78%), and detection of prominent fluid waves (76%). 13 
There is good agreement among physicians on the presence 
or absence of traditional signs for ascites. 

ACCURACY OF SIGNS FOR ASCITES 

Three investigations have compared physical examination 
findings for ascites with findings from reference standard 
abdominal ultrasonographic examinations. 9,10,13 Despite the 
various levels of training (internal medicine interns to staff 
gastroenterologists), the results were similar in each study 
(Table 6-3). There was no single sign for ascites that was both 
sensitive and specific. However, flank dullness (>80%) and 
bulging flanks (>72%) were sensitive in all studies. Shifting 
dullness had a high sensitivity (>83%) in 2 investigations. The 
puddle sign, purported to be the most sensitive test for ascites, 
performed poorly, yielding at best a sensitivity of 55%. The 
absence of a fluid wave was the only sign with a high specificity 
(82%-92%) across all studies. Shifting dullness was highly spe¬ 
cific in only 1 study 9 ; results may be inconsistent because of 


Table 6-2 Accuracy of the Clinical History 3 



Historical Item 
or Symptom 

Sensitivity 

Specificity 

LR+ 

LR- 

Increased girth 

0.87 

0.77 

4.2 

0.17 

Recent weight gain 

0.67 

0.79 

3.2 

0.42 

Hepatitis 

0.27 

0.92 

3.2 

0.80 

Ankle swelling 

0.93 

0.66 

2.8 

0.10 

Heart failure 

0.47 

0.73 

2.0 

0.73 

Alcoholism 

0.60 

0.58 

1.4 

0.69 

History of carcinoma 

0.13 

0.85 

0.91 

1.0 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Adapted from Simel et al. 9 


differences in the study populations (general medical vs 
patients with liver disease). To date, no investigator has studied 
how to best use these signs in combination. 

The clinician must know the pretest probability or preva¬ 
lence of a disease to apply sensitivity and specificity data to 
an individual patient. The LRs for the physical examination 
signs from the 3 studies are displayed in Table 6-4. We com¬ 
bined the study results according to the number of unique 
patients in each study to yield pooled sensitivity, specificity, 
and LRs (Table 6-5). The finding of a fluid wave, shifting 
dullness, or peripheral edema increased the likelihood of 
ascites the most. The absence of bulging flanks, flank dull¬ 
ness, shifting dullness, or peripheral edema decreased the 
likelihood of ascites the most. 

Finally, is the whole greater than the sum of the parts? Is 
an examiner’s overall clinical impression more accurate 
than individual signs or symptoms of ascites? Two studies 
evaluated the accuracy of the overall clinical assessment for 
ascites. In the study by Cattau et al 10 of patients who were 
referred because their physicians were unsure about the 
presence of ascites, the examiners correctly determined the 
presence or absence of ascites in only 56% of patients in this 
most difficult clinical scenario. In the study by Simel et al, 9 
examiners categorized the probability of ascites as high, 
intermediate, or low. Examiners at all levels of training 
(intern through chief resident) were accurate when assign¬ 
ing a high probability of ascites (LR+, 38-83) but were less 
accurate at low probability of ascites (LR-, 0.77-0.87). 
Apparently, a high probability of ascites in hospitalized 
patients was sufficient to make the diagnosis, but a low 
probability was not enough to rule out ascites. This rule 
may not apply for outpatients. 

The following suggestions should guide clinical teaching 
and performance of the clinical examination for detecting 
ascites: 

1. The most useful findings for ruling out ascites are no his¬ 
tory of ankle swelling or increased abdominal girth and 
the inability to demonstrate bulging flanks, flank dullness, 
or shifting dullness. 

2. The most powerful findings for making the diagnosis of 
ascites are a positive fluid wave result, shifting dullness, or 
peripheral edema. 


Table 6-3 Sensitivity 

Sign 

and Specificity of the Physical Examination for Ascites 

Sensitivity 


Specificity 


Cummings et al 13 

Simel et al 9 

Cattau et al 10 

Cummings et al 13 

Simel et al 9 

Cattau et al 10 

Flank dullness 

NA 

0.80 

0.94 

NA 

0.69“ 

0.29 

Bulging flanks 

0.72 

0.93 

0.78 

0.70 

0.54 

0.44 

Shifting dullness 

0.88 

0.60 

0.83 

0.56 

0.90“ 

0.56 

Fluid wave 

0.53 

0.80 

0.50 

0.90 

0.92 

0.82 

Puddle sign 

NA 

0.43 

0.55 

NA 

0.83 

0.51 


Abbreviation: NA, not available. 

“Test for heterogeneity suggests these values are significantly better across studies (P< .01). 
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Table 6-4 Likelihood Rs 

Sign 

itios for the Physical Examination for Ascites 3 

LR+ 


LR- 


Cummings et al 13 

Simel et al 9 

Cattau et al 10 

Cummings et al 13 

Simel et al 9 

Cattau et al 10 

Bulging flanks 

2.4 

2.0 

1.4 

0.4 

0.1 

0.5 

Flank dullness 

NA 

2.6 

1.3 

NA 

0.3 

0.2 

Shifting dullness 

2.0 

5.8 

1.9 

0.2 

0.5 

0.4 

Fluid wave 

5.3 

9.6 

2.8 

0.5 

0.2 

0.6 

Puddle sign 

NA 

2.6 

1.1 

NA 

0.7 

0.9 

Peripheral edema 

NA 

3.8 

NA 

NA 

0.2 

NA 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio; NA, not available. 

“Examiners were board-certified general internists in the study by Cummings et al, 13 internal medicine house staff in that by Simel et al, 9 and staff gastroenterologists in that by 
Cattau et al. 10 


Table 6-5 Pooled Results of Physical Examination Studies 


Physical Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Sensitivity (95% Cl) 

Specificity (95% Cl) 

Bulging flanks 

2.0 (1.5-2.6) 

0.3 (0.2-0.6) 

0.81 (0.69-0.93) 

0.59 (0.50-0.68) 

Flank dullness 

2.0 (1.5-2.9) 

0.3 (0.1-0.7) 

0.84 (0.68-1.00) 

0.59(0.47-0.71) 

Shifting dullness 

2.7 (1.9-3.9) 

0.3 (0.2-0.6) 

0.77 (0.64-0.90) 

0.72 (0.63-0.81) 

Fluid wave 

6.0(3.3-11) 

0.4 (0.3-0.6) 

0.62 (0.47-0.77) 

0.90 (0.84-0.96) 

Puddle sign 

1.6 (0.8-3.4) 

0.8 (0.5-1.2) 

0.45 (0.20-0.70) 

0.73(0.61-0.85) 


Abbreviations: Cl, confidence interval; LR+ positive likelihood ratio; LR-, negative likelihood ratio. 


3. The puddle sign is difficult to perform, uncomfortable for 
patients, and not sensitive to small amounts of ascites. It 
should not be performed. 
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Reviewed by Rose Hatala, MD, and David Edelman, MD 


CLINICAL SCENARIO 


A 48-year-old man became intoxicated and fell down sev¬ 
eral steps. He presents to the emergency department with a 
normal blood pressure despite some abdominal pain. He 
has been a moderate to heavy drinker since his teenage 
years. Your examination reveals mild, diffuse abdominal 
discomfort and a bruise on the flank where he fell. There is 
bilateral ankle edema. You cannot appreciate a fluid wave, 
although the flanks seem to bulge. 

UPDATED SUMMARY ON ASCITES 

Original Review 

Williams JW Jr, Simel DL. Does this patient have ascites? how 
to divine fluid in the abdomen. JAMA. 1992;267(19):2645- 
2648. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for the 
Rational Clinical Examination series, combined with the sub¬ 
ject “exp ascites” published in English from 1991 to 2004. The 
results yielded 118 titles, for which we reviewed the titles and 
abstracts. Only 1 article evaluated the clinical signs for ascites 
in a general clinical population. 

NEW FINDINGS 

• The accepted reference standard (ultrasonography) detects 
peritoneal fluid in smaller amounts than could ever be 
detected by clinical examination. 

• The presence of a fluid wave or shifting dullness is con¬ 
firmed as the most useful finding. Because the reference 
standard detects such small amounts of ascites, the absence 
of any physical examination finding does not reliably 
exclude the presence of peritoneal fluid. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A reappraisal of the original publication showed that confi¬ 
dence intervals (CIs) around the symptoms we reported 


would help display their potential importance. We added CIs 
to this update. 


CHANGES IN THE REFERENCE STANDARD 

Ascites refers to abnormally large collections of peritoneal 
fluid. Studies now confirm that very small amounts of fluid 
can be detected by transabdominal ultrasonography 1 or endo¬ 
scopic ultrasonography. 2 However, there are no defined cut 
points at which the presence of small amounts of peritoneal 
fluid detected by imaging procedures meets a standard of asci¬ 
tes. All the studies in the original review and subsequent stud¬ 
ies consider any amount of peritoneal fluid as “ascites.” 


RESULTS OF LITERATURE REVIEW 

Since the original review was published, no additional stud¬ 
ies have evaluated a patient’s symptoms for ascites or combi¬ 
nations of symptoms and signs. The information about 
symptoms suggesting ascites comes from 1 study ( le 6-6). 
The finding of auscultatory percussion was evaluated, but 
the CIs around both the positive likelihood ratio and nega¬ 
tive likelihood ratio include 1, suggesting that it is not a use¬ 
ful maneuver. 

An additional study 3 in a selected population of thin 
patients validated the presence of the fluid wave as the most 
useful finding from the clinical examination ( ble 6-7). 
All published studies counted the presence of any fluid on 
ultrasonography as “positive”; this rigorous reference stan¬ 
dard would, not surprisingly, demonstrate that the physi¬ 
cal findings fail frequently in proving the absence of small 


Table 6-6 Results for Symptoms of Ascites 


Symptoms 

(1 Study, 64 Patients) 

LR+ (95% Cl) 

LR- (95% Cl) 

Increased girth 

4.1 (2.3-7.4) 

0.17(0.05-0.62) 

Recent weight gain 

3.2 (1.7-6.2) 

0.42 (0.20-0.87) 

Ankle swelling 

2.8 (1.8-4.3) 

0.10(0.01-0.67) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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Table 6-7 Pooled Results for the Physical Signs tor Ascites 

Physical Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Fluid wave 

(4 studies, 372 patients) 

5.3 (2.9-9.5) 

0.57 (0.38-0.85) 

Peripheral edema 
(1 study, 63 patients) 

3.8 (2.2-6.8) 

0.17(0.05-0.50) 

Shifting dullness 
(4 studies, 372 patients) 

2.1 (1.6-2.9) 

0.40(0.21-0.78) 

Bulging flanks 
(4 studies, 372 patients) 

1.8(1.4-2.5) 

0.48 (0.28-0.83) 

Flank dullness 
(3 studies, 192 patients) 

1.7(1.0-27) 

0.44(0.20-1.0) 

Puddle sign 

(3 studies, 172 patients) 

1.3(0.93-2.00) 

0.79(0.59-1.1) 

Auscultatory percussion 
(1 study, 66 patients) 

1.3(0.85-2.00) 

0.71 (0.39-1.3) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


amounts of fluid. This standard may also be why the pres¬ 
ence of peripheral edema (evaluated in only 1 study), which 
is easier to detect than ascites and is a marker for extracellu¬ 
lar fluid, may be both sensitive and specific for the presence 
of peritoneal fluid detected by ultrasonography. On the other 
hand, when the signs for ascites are absent, the lower bounds 
of the CIs suggest that physicians may be able to rule out 
large amounts of ascites. 

We feel confident that the puddle sign and auscultatory 
percussion are not useful. 

Given the low pretest probability of ascites in the general 
population, patients should not be evaluated for ascites 


during a routine physical examination. When it is impor¬ 
tant to detect smaller amounts of peritoneal fluid, radio- 
logic images will be necessary because the clinical 
examination will not be useful, which is especially impor¬ 
tant when evaluating for ovarian carcinoma (or other 
abdominal malignancies) and for patients with blunt 
abdominal trauma when the clinical significance of missing 
a small amount of peritoneal fluid is high. 

EVIDENCE FROM GUIDELINES 

No guidelines advocate for the routine assessment of ascites. 


CLINICAL SCENARIO—RESOLUTION 


Alcoholism alone does not appreciably change the likeli¬ 
hood of ascites (likelihood ratio, 1.4). If the baseline prev¬ 
alence of ascites in general medical patients is 5%, a 
diagnosis of alcoholism increases the probability to only 
7%. The patient in the scenario could have preexisting 
ascites from cirrhosis, but he could also have hemoperito- 
neum from the fall. Unfortunately, none of the symptoms 
or signs of ascites have been evaluated well for their utility 
during blunt trauma. The presence of peripheral edema is 
a useful finding when present (suggesting ascites) or when 
absent (suggesting no ascites). You decide you need to 
know for certain whether the patient has a hemoperito- 
neum, so you must proceed to additional testing such as 
ultrasonography, diagnostic peritoneal lavage, or com¬ 
puted tomography. 4,5 
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ASCITES—MAKE THE DIAGNOSIS 

During the general physical examination, patients should 
not be evaluated for ascites. When it is important to detect 



Table 6-8 Symptoms of Ascites 

LR (95% Cl) 

smaller amounts of peritoneal fluid, radiologic images 
will be necessary because the clinical examination will not 
be useful, which is especially important when evaluating 
for abdominal malignancies or for patients with blunt 

Make Ascites More Likely 

Increased abdominal girth 

4.1 (2.3-47) 

abdominal trauma. 

Recent weight gain 

3.2 (1.7-6.2) 

PRIOR PROBABILITY 

Ankle swelling 

2.8 (1.8-4.3) 

Make Ascites Less Likely 

The prevalence of ascites in an unselected population is 
low, likely on the order of less than 1% (expert opinion). 

The prevalence of ascites among general medical 
patients will be slightly higher, but still less than 5% 
(expert opinion). 

No ankle swelling 

0.10(0.01-0.67) 

No increase in abdominal girth 

0.17(0.05-0.62) 

No recent weight gain 

0.42 (0.20-0.87) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

POPULATION FOR WHOM THE SYMPTOMS 



AND SIGNS SHOULD BE EVALUATED 

• Cirrhosis 

Table 6-9 Signs for Ascites 

LR (95% Cl) 

• Congestive heart failure 

Fluid wave 

5.3 (2.9-9.5) 

• Constrictive pericarditis 

Shifting dullness 

2.1 (1.6-2.9) 

• Nephrotic syndrome 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

• Malnutrition, chronic diarrhea 

• Neoplastic disorders (any peritoneal fluid might be 
important) 

The absence of findings does not rule out the presence of 
smaller amounts of peritoneal fluid. See Tables 6-8 and 6-9. 

• Systemic infectious diseases 

REFERENCE STANDARD TESTS 


• Blunt abdominal trauma (any peritoneal fluid might be 

• Ultrasonography 


important) 

• Computed tomography 

• Diagnostic paracentesis 
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EVIDENCE TO SUPPORT THE UPDATE 


Ascites 



TITLE Accuracy of Clinical Maneuvers in Detection of 
Minimal Ascites. 

AUTHORS Chongtham DS, Singh MM, Kalantri SP, 
Pathak S, Jain AP. 

CITATION Indian}Med Sci. 1998;52(ll):514-520. 

QUESTION How well do commonly used maneuvers for 
detecting ascites work on a general medical ward? 

DESIGN One examiner identified patients for study, 
whereas a second examiner performed the maneuvers 
on all enrolled patients. An ultrasonographer, blinded 
to the findings, identified all patients with any degree of 
ascites. 

SETTING Medical ward in India. 

PATIENTS A total of 66 patients admitted to a ward for 
cardiac, hepatic, renal, nutritional, infectious, or neoplas¬ 
tic disorders. Those with a history of ascites, paracentesis, 
or “evidence of ascites from history” were excluded. These 
were thin patients by western standards, with a mean 
weight of about 49 kg (108 lb) for men and 46 kg (101 lb) 
for women (there was no difference in the weight of those 
with vs those without ascites). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

An examiner, blinded to the entrance criteria, evaluated each 
patient. The ultrasonographer was blinded to the entrance 
criteria and clinical findings. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios. 


Table 6-10 Likelihood Ratios for Signs of Ascites 3 


Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Bulging 

flanks 

0.51 

0.64 

1.4 (0.79-2.5) 

0.75(0.49-1.1) 

Flank dull¬ 
ness 

0.57 

0.61 

1.5 (0.88-2.5) 

0.70(0.44-1.1) 

Shifting dull¬ 
ness 

0.46 

0.74 

1.8 (0.9-3.6) 

0.73(0.51-1.0) 

Fluid wave 

0.20 

100 

13(0.79-224) 

0.80 (0.67-0.95) 

Puddle sign 

0.46 

0.68 

1.4(0.75-2.6) 

0.80 (0.54-1.2) 

Auscultatory 

percussion 

0.66 

0.48 

2.0 (0.86-2.0) 

0.71 (0.40-1.3) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

The authors observed that most of the patients had “minimal” ascites. 


CONCLUSION 

LEVEL OF EVIDENCE Level 2. 

This study was performed with high quality, although it 
used only 1 examiner. The results confirm that the presence 
of a fluid wave is the best finding in favor of ascites. In addi¬ 
tion, the puddle sign (as in previous studies) and ausculta¬ 
tory percussion have poor discriminative ability. 

This study population is unique in that it consisted of patients 
different from those in previous studies—these patients have a 
small body habitus, creating an expectation that the physical 
examination might have yielded better results. On the other 
hand, the patients in this study were selected because it was not 
obvious whether they had ascites. Furthermore, the definition of 
ascites was any peritoneal fluid detected by ultrasonography (as 
in previous work), and the authors observed that most of the 
patients had minimal ascites. This study confirms that the physi¬ 
cal examination cannot detect small amounts of peritoneal fluid. 

Reviewed by David L. Simel, MD, MHS 


MAIN RESULTS 

See Table 6-10. 
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CHAPTER 


What Can the Medical 
History and Physical 
Examination Tell Us About 

Low Back Pain? 

Richard A. Deyo, MD, MPH 
James Rainville, MD 
Daniel L. Kent, MD 


Back pain ranks second only to upper respiratory illness as a 
symptomatic reason for office visits to physicians. 1 Approxi¬ 
mately 70% of adults have low back pain at some time, but 
only 14% have an episode that lasts more than 2 weeks. About 
1.5% have such episodes with features of sciatica. 2,3 Most 
causes of back pain respond to symptomatic and physical mea¬ 
sures, but some are surgically remediable and some are sys¬ 
temic diseases (cancer or disseminated infection) requiring 
specific therapy, so careful diagnostic evaluation is important. 
Features of the clinical history and physical examination influ¬ 
ence not only therapeutic choices but also decisions about 
diagnostic imaging, laboratory testing, and specialist referral. 

ANATOMIC/PHYSIOLOGIC ORIGINS OF 
FINDINGS IN THE LOW BACK 

Low back pain may arise from several structures in the lumbar 
spine, including the ligaments that interconnect vertebrae, outer 
fibers of the annulus fibrosus, facet joints, vertebral periosteum, 
paravertebral musculature and fascia, blood vessels, and spinal 
nerve roots. The causes of low back pain generated through 
these structures include (1) musculoligamentous injuries; 
(2) degenerative changes in the intervertebral disks and facet 
joints; (3) herniation of the nucleus pulposus of an interverte¬ 
bral disk, with irritation of adjacent nerve roots; (4) spinal ste¬ 
nosis (narrowing of the central spinal canal or the lateral recesses 
of the canal in which the nerve roots travel caudally; this usually 
results from hypertrophic degenerative changes in the disks, lig- 
amentum flavum, and facet joints); (5) anatomic anomalies of 
the spine, such as scoliosis and spondylolisthesis, which are 
often asymptomatic but may cause pain when they are severe; 
(6) underlying systemic diseases, such as primary or metastatic 
cancer, spinal infections, and ankylosing spondylitis; and (7) vis¬ 
ceral diseases unrelated to the spine, including diseases of the 
pelvic organs, kidneys, gastrointestinal tract, and aorta (diagno¬ 
sis of which will not be discussed in the present report). 

PREVALENCE OF DISEASES THAT 
PRODUCE LOW BACK PAIN 

Up to 85% of patients cannot be given a definitive diagnosis 
because of weak associations among symptoms, pathologic 
changes, and imaging results. 4,5 We assume that many of these 
cases are related to musculoligamentous injury or degenerative 
changes. 

Anatomic evidence of a herniated disk is found in 20% to 30% 
of imaging tests (myelography, computed tomography [CT], and 
magnetic resonance imaging [MRI]) among normal persons. 6,7 
These herniations are asymptomatic and result in no clinical dis¬ 
ease. The proportion of all persons with low back pain who 
undergo surgery for a disk herniation is only about 2%. 2 

In primary care, about 4% of patients with back pain will 
prove to have compression fractures, 3% have spondylolisthesis, 
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and only 0.7% have spinal malignant neoplasms (primary or 
metastatic). 813 Even fewer have ankylosing spondylitis (about 
0.3%) or spinal infections (0.01%). 8 - 14 ’ 15 Widespread recognition 
of spinal stenosis has occurred only in the last 15 years. It is most 
common in older adults, but its prevalence is unknown. 

Because a specific cause frequently cannot be identified, 
diagnostic efforts are often disappointing. Instead of seeking 
a precise cause in every case of back pain, it may be most use¬ 
ful to answer 3 basic questions 9 : (1) Is there a serious sys¬ 
temic disease causing the pain? (2) Is there neurologic 
compromise that might require surgical evaluation? (3) Is 
there social or psychological distress that may amplify or 
prolong pain? These questions can generally be answered 
according to medical history and physical examination alone, 
and a minority of patients requires further diagnostic testing. 

IS THERE EVIDENCE OF SYSTEMIC DISEASE? 

Cancer 

Malignant neoplasm (primary or metastatic) is the most 
common systemic disease affecting the spine, although it 


accounts for less than 1% of episodes of low back pain. 
Approximately 80% of patients with this diagnosis are 
older than 50 years (Table 7-1). A history of cancer has 
such high specificity (0.98) that such patients should be 
considered to have cancer until proven otherwise. How¬ 
ever, only one-third of patients with an underlying malig¬ 
nant neoplasm causing their back pain have a prior cancer 
diagnosis (sensitivity, 0.31). Unexplained weight loss, pain 
duration greater than 1 month, and failure to improve with 
conservative therapy are moderately specific findings. Most 
patients with back pain caused by cancer report that pain is 
unrelieved by bed rest (sensitivity > 0.90), but the finding 
is nonspecific. 10 In a study of nearly 2000 patients with 
back pain, no cancer was identified in any patient younger 
than 50 years and without a history of cancer, unexplained 
weight loss, or a failure of conservative therapy (combined 
sensitivity, 100%). 10 

The physical examination is less useful than the medical 
history for detecting underlying cancer, 10 except in late stages. 
Because the breast, lung, and prostate are the most common 
sources of spinal metastases, these organs should be exam¬ 
ined when cancer is suspected. 


Table 7-1 Estimated Accuracy of the Medical History in the Diagnosis of Spine Diseases Causing Low Back Pain 



Diseases to Be Detected 

Source, Year 

Medical History 

Sensitivity 

Specificity 

Cancer 

Deyo and Diehl, 10 1988 

Age > 50 y 

0.77 

0.71 



History of cancer 

0.31 

0.98 



Unexplained weight loss 

0.15 

0.94 



Failure to improve with a month of therapy 

0.31 

0.90 



No relief with bed rest 

>0.90 

0.46 



Duration of pain > 1 mo 

0.50 

0.81 



Age > 50 y or history of cancer or unexplained weight loss 
or failure of conservative therapy 

1.0 

0.60 

Spinal osteomyelitis 

Waldvogel and Vasey, 16 1980 

Intravenous drug abuse, urinary tract infection, or skin 
infection 

0.40 

NA 

Compression fracture 

Unpublished data 3 

Age > 50 y 

0.84 

0.61 



Age > 70 y 

0.22 

0.96 



Trauma 

0.30 

0.85 



Corticosteroid use 

0.06 

0.995 

Herniated disk 

Deyo and Tsui-Wu, 2 1987; 
Spangfort, 17 1972 

Sciatica 

0.95 

0.88 

Spinal stenosis 

Turner et al, 18 1992 

Pseudoclaudication 

0.60 

NA 



Age > 50 y 

0.90 b 

0.70 

Ankylosing spondylitis 

Gran, 19 1985 

4 of 5 positive responses 0 

0.23 

0.82 



Age at onset < 40 y 

1.0 

0.07 



Pain not relieved supine 

0.80 

0.49 



Morning back stiffness 

0.64 

0.59 



Pain duration > 3 mo 

0.71 

0.54 


Abbreviation: NA, not available. 

“From 833 patients with back pain at a walk-in clinic, all of whom received pain lumbar radiographs. 
b Authors’ estimate. 

The 5 screening questions were (1) Onset of back discomfort before age 40 years? (2) Did the problem begin slowly? (3) Persistence for at least 3 months? (4) Morning stiff¬ 
ness? and (5) improved by exercise? 
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Spinal Infections 

Spinal infections usually are blood-borne from other sites, 
including urinary tract infections, indwelling urinary cathe¬ 
ters, skin infections, and injection sites for illicit intravenous 
drugs. One of these sites is identified in approximately 40% 
of patients with spinal infections (sensitivity, 0.40). 16 

In patients with spinal infections, the sensitivity of fever is 
disappointing, varying from 0.27 for tuberculous osteomy¬ 
elitis to 0.50 for pyogenic osteomyelitis 20 and 0.83 for spinal 
epidural abscess. 21 Because 2% of patients in primary care 
with mechanical low back pain have fever (perhaps because 
of viral syndromes), specificity for bacterial infection is 
approximately 0.98. 10 Spine tenderness in response to percus¬ 
sion has a sensitivity of 0.86 for bacterial infection, but speci¬ 
ficity is poor (0.60). 10 ' 22,23 

Compression Fractures 

Although spinal compression fractures are not systemic dis¬ 
eases, they often occur in persons with generalized osteopo¬ 
rosis. Most patients with this problem do not have a history 
of identifiable trauma (sensitivity, 0.30). A person with back 
pain who is receiving long-term corticosteroid therapy is 
considered to have a compression fracture until proven oth¬ 
erwise (specificity, 0.99). Black and Hispanic women have 
only one-fourth as many compression fractures as white 
women. 24 As shown in Table 7-1, age greater than 70 years is a 
relatively specific finding (specificity, 0.96). 

Ankylosing Spondylitis and Spine 
Range-of-Motion Measures 

Ankylosing spondylitis shares several historical features 
with other inflammatory arthropathies, such as rheumatoid 
arthritis. Calin et al 25 described 5 screening questions for 
ankylosing spondylitis: (1) Is there morning stiffness? (2) Is 
there improvement in discomfort with exercise? (3) Was the 
onset of back pain before age 40 years? (4) Did the problem 
begin slowly? (5) Has the pain persisted for at least 3 
months? 

With at least 4 positive answers to define a positive “test” 
result, the sensitivity of these questions was 0.95 and specific¬ 
ity was 0.85, 25 although other authors report lower sensitiv¬ 
ity. 19 ’ 26 When screening for a rare disease such as ankylosing 
spondylitis, typically, the predictive value of a positive test is 
low. In an industrial screening program, only 16 of 367 per¬ 
sons with positive criteria proved to have ankylosing spondyli¬ 
tis (a predictive value of 0.04). 27 “Inflammatory” symptoms 
(morning stiffness, night pain, and relief with exercise) are 
moderately sensitive but nonspecific. All patients with anky¬ 
losing spondylitis in 1 population survey reported symptom 
onset before age 40 years, making this history highly sensitive 
but nonspecific (Table 7-1). 19 

Reduced spinal mobility results from fusion of adjacent 
vertebrae in this condition. The Schober test, which mea¬ 
sures distraction between 2 marks on the skin during forward 
flexion, is a commonly described method for quantifying 


reduced flexion. Although it is moderately reproducible, 23 ’ 28 
reduced spine flexion is not specific for inflammatory 
spondylopathies, being equally common in patients with 
chronic back pain or spine tumors. 29 Reduced chest expan¬ 
sion (using a strict criterion for abnormality, such as expan¬ 
sion < 2.5 cm) is highly specific (0.99) but insensitive in early 
ankylosing spondylitis (0.09), 19,30 so that predictive values are 
poor. 

Tests for sacroiliac joint tenderness (to discriminate anky¬ 
losing spondylitis from mechanical spine conditions) include 
a hip extension test, anteroposterior pelvic pressure, lateral 
pelvic compression, and direct pressure on the sacroiliac 
joints. Unfortunately, these tests are poorly reproducible 23,31 
and inaccurate in distinguishing ankylosing spondylitis from 
mechanical spine complaints. 32,33 Early ankylosing spondylitis 
is most often suspected from radiographs obtained because 
of persistent pain. 

Although spine flexion is of limited diagnostic value, it 
may be useful in planning or monitoring physical therapy in 
patients with low back pain of any cause. 34 Range of motion 
in multiple directions can be assessed with 2 inclinometers 
(used in the construction industry) with good precision. 28,34 
The technique is detailed elsewhere. 34 

IS THERE EVIDENCE OF NEUROLOGIC COMPROMISE? 

The spinal cord, cauda equina, and nerve roots are vulnera¬ 
ble to several disorders that cause back pain and sciatica. The 
most common of these is a herniated intervertebral disk, but 
other causes include nerve root entrapment in the root 
canals by bony and ligamentous hypertrophy, spinal stenosis, 
spinal or paraspinal infections, and neoplasms. Irritation of 
neurologic structures is manifested as motor, reflex, or sen¬ 
sory dysfunction in the lower extremities and (rarely) as 
bowel or bladder dysfunction. 

The first clue to nerve root irritation is usually sciatica, a 
sharp or burning pain radiating down the posterior or lateral 
aspect of the leg (usually to the foot or ankle), often associ¬ 
ated with numbness or paresthesia. The pain is sometimes 
aggravated by coughing, sneezing, or the Valsalva maneuver. 
Among patients with low back pain alone (no sciatica or 
neurologic symptoms), the prevalence of neurologic impair¬ 
ments is so low that extensive neurologic evaluation is usu¬ 
ally unnecessary. 

Lumbar Disk Herniations 

Sciatica has such a high sensitivity (0.95) that its absence 
makes a clinically important lumbar disk herniation unlikely. 17,35 
Using the accuracy of sciatica in Table 7-1 and a prevalence of 
surgically important disk herniations of 2%, we estimate the 
likelihood of disk herniation in a patient without sciatica to 
be 1 in 1000. Most patients have a long history of recurrent 
back pain before the onset of sciatica, but when a frank disk 
herniation occurs, leg pain usually overshadows the back 
pain. The peak incidence of herniated lumbar disks is in 
adults between the ages of 30 and 55 years. 17 
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A symptomatic disk herniation tethers the affected nerve 
root, so pain results from stretching the nerve by straight-leg 
raising (SLR) from the supine position. This is performed by 
cupping the heel in 1 hand and keeping the knee fully 
extended with the other. The straight leg is slowly raised 
from the examining table until pain occurs. Tension is trans¬ 
mitted to the nerve roots once the leg is raised beyond 30 
degrees, but after 70 degrees, further movement of the nerve 
is negligible. 36 A typical positive SLR sign is one that repro¬ 
duces the patient’s sciatica between 30 degrees and 60 degrees 
of leg elevation. 17 - 37 - 38 

A related test is the crossed SLR (CSLR) sign. This occurs 
when SLR is performed on the patient’s well leg and is found 
to elicit pain in the leg with sciatica. The precision of tests for 
SLR is shown in Table 7-2. 23 - 39 ' 41 Visual estimation is reason¬ 
ably accurate, but a goniometer or inclinometer improves 
interobserver agreement. 

Pain on ipsilateral SLR at 60 degrees is moderately sensitive 
for herniated lumbar disks but nonspecific, because limitation 
is often observed in the absence of disk herniations (Table 
7-3). 43 ' 45 CSLR is less sensitive but highly specific. 17 ’ 44 - 45 - 48 Thus, 
a positive CSLR test result substantially increases the likeli¬ 
hood of a disk herniation, whereas a negative result is of lim¬ 
ited value. The lower the angle of a positive SLR test, the more 
specific the test becomes and the larger the disk protrusion 
found at surgery. 46 - 49 

Straight-leg raising is most appropriate for testing the lower 
lumbar nerve roots (L5 and SI), where the majority of herni¬ 
ated disks occur. Irritation of higher lumbar roots is tested 
with the femoral nerve stretch test (flexing the knee with 


patient prone), but the precision and accuracy of this test are 
unknown. 

Assessment of Motor, Reflex, and Sensory Function 

Ninety-eight percent of clinically important lumbar disk 
herniations occur at either the L4 to L5 or the L5 to SI 
intervertebral level, 17 - 44 46 causing neurologic impairments in 
the motor and sensory territories of the L5 and SI nerve 
roots. Thus, the most common neurologic impairments are 
weakness of the ankle and great-toe dorsiflexors (L5), 
diminished ankle reflexes (SI), and sensory loss in the feet 
(L5 and SI). 17 - 44 ' 46 In a patient with sciatica, the neurologic 
examination can focus on these functions. 

Ankle dorsiflexor strength is tested by having the 
supine patient dorsiflex the ankle against the examiner’s 
resistance. Inability to maintain dorsiflexion against the 
examiner should be considered weakness, and the healthy 
side should be checked for comparison. This method 
shows excellent precision (Table 7-2) and is more repro¬ 
ducible than the patient’s ability to heel stand. 23 Ankle 
dorsiflexor weakness rarely occurs in isolation and is 
nearly always associated with weak toe dorsiflexion, sen¬ 
sory deficits, or impaired reflexes. 50 For toe strength, the 
supine patient is instructed to maximally dorsiflex the 
great toe (“point your big toe at your nose” seems to work 
well) and resist the examiner’s effort to flex the toe with 2 
fingers. 

Ankle reflexes are more difficult to reproduce, and patient 
positioning may be important. The side-lying, prone, and 


Table 7-2 Reproducibility of Physical Examination Findings 




Category 

Test Unit of Measurement 

Interobserver Agreement (Statistic) 

Source, Year 

Tenderness 

Bone tenderness 

Yes/No 

0.40 (k) 

McCombe etal, 23 1989 


Soft-tissue tenderness 

Yes/No 

0.24 (k) 

McCombe etal, 23 1989 


Muscle spasm 

Yes/No 

“Discarded; too unreliable” 

Waddell et al, 39 1982 

SLR 

Ipsilateral SLR, inclinometer 

Degrees 

0.78-0.97 (r) 

Hoehler and Tobis, 40 1982 
Hsiehetal, 41 1983 


Ipsilateral SLR goniometer 

Degrees 

0.69 (r) 

McCombe et al, 23 1989 


SLR causes leg pain 

Yes/No 

0.66 (k) 

McCombe et al, 23 1989 


Ipsilateral SLR < 75° by visual estimation 

Yes/No 

0.56 (k) 

Waddell etal, 39 1982 


CSLR, causes pain 

Yes/No 

0.74 (k) 

McCombe etal, 23 1989 

Neurologic 

Ankle dorsiflexion weak 

Yes/No 

1.00 (k) 

McCombe etal, 23 1989 

examination 

Great toe extensors weak 

Yes/No 

0.65 (k) 

McCombe etal, 23 1989 


Ankle reflexes normal 

Yes/No 

0.39-0.50 (k) 

McCombe etal, 23 1989 
Schwartz et al, 42 1990 


Any sensory deficit 

Yes/No 

0.68 (k) 

McCombe etal, 23 1989 


Calf wasting 

Yes/No 

0.80 (k) 

McCombe etal, 23 1989 

Inappropriate 

Superficial tenderness 

Yes/No 

0.29 (k) 

McCombe etal, 23 1989 

signs 

Simulated rotation or axial loading causes pain 

Yes/No 

0.25 (k) 

McCombe et al, 23 1989 


SLR with distraction causes pain 

Yes/No 

0.40 (k) 

McCombe et al, 23 1989 


Inexplicable pattern, neurologic examination 

Yes/No 

0.03 (k) 

McCombe et al, 23 1989 


Overreaction 

Yes/No 

0.29 (k) 

McCombe et al, 23 1989 


Abbreviations: CSLR, crossed straight-leg raising; SLR, straight-leg raising. 
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Table 7-3 Estimated Accuracy of Physical Examination for Lumbar Disk Herniation Among Patients With Sciatica 

Test 

Source, Year 

Sensitivity 3 

Specificity 3 

Comments 

Ipsilateral SLR 

Kosteljanetz et al, 43 1984; Hakelius and Hind- 
marsh, 44 1972 

0.80 

0.40 

Positive test result; leg pain at < 60° 

CSLR 

Spangfort, 17 1972; Hakelius and Hindmarsh, 4445 
1972 

0.25 

0.90 

Positive test result: reproduction of contra¬ 
lateral pain 

Ankle dorsiflexion weakness 

Spangfort, 17 1972; Hakelius and Hindmarsh, 44 
1972 

0.35 

0.70 

HNP usually at L4-5 (80%) 

Great toe extensor weakness 

Hakelius and Hindmarsh, 44 1972; Kortelainen et 
al, 46 1985 

0.50 

0.70 

HNP usually at L5-S1 (60%) or L4-5 (30%) 

Impaired ankle reflex 

Spangfort, 17 1972; Hakelius and Hindmarsh, 44 
1972 

0.50 

0.60 

HNP usually at L5-S1; absent reflex 
increases specificity 

Sensory loss 

Kosteljanetz et al, 43 1984; Kortelainen et al, 46 
1985 

0.50 

0.50 

Area of loss poor predictor of HNP level 

Patella reflex 

Aronson and Dunsmore, 47 1963 

0.50 

NA 

For upper lumbar HNP only 

Ankle plantar flexion weakness 

Hakelius and Hindmarsh, 44 1972 

0.06 

0.95 


Quadriceps weakness 

Hakelius and Hindmarsh, 44 1972 

<0.01 

0.99 



Abbreviations: CSLR, crossed straight-leg raising; HNP, herniated nucleus pulposus; SLR, straight-leg raising; NA, not available. 

“Sensitivity and specificity were calculated by the authors of the present report. Values represent rounded averages where multiple references were available. All results are from 
surgical case series. 


kneeling positions are probably best (rather than the sitting 
position), but we are unaware of comparative data. The foot 
is gently rocked until relaxation is obtained, and the calf 
muscles should be held under slight tension by dorsiflexing 
the foot. Estimated K values for the precision of ankle reflexes 
range from 0.39 to 0.50. 23,48 Schwartz et al 42 found that a plan¬ 
tar tap is as good as an Achilles tendon tap (estimated K = 
0.55). In this technique, the patient lies supine and the ball of 
the foot is tapped with the reflex hammer. The plantar tap 
was preferred by patients and could be elicited in 91% of 
patients younger than 65 years but in only 71% of patients 
older than 65 years. 

Ankle plantar flexion is an SI function, but only severe 
impairments can be clinically detected, and sensitivity for 
disk herniation is low (Table 7-3). Toe walking appears to be 
an unreliable method of assessing plantar flexion strength (k 
= 0). 23 Hamstring and hip extensor strength have been used 
to evaluate SI root injuries, but their precision and accuracy 
are unknown. Muscle wasting indicates longstanding dener¬ 
vation or disease and may be detected visually. Good preci¬ 
sion was noted for observations of anterior compartment 
and hamstring wasting in one study (Table 7-2). 23 

Sensory examination of the lower extremities takes 
time. Patients distinguish differences in pain intensity by 
pinprick more accurately than differences in touch or 
temperature, and sensory impairment from nerve root 
compression is most frequent in the distal extremes of the 
dermatomes. 51 Therefore, an efficient strategy is to check 
for symmetry of pain elicited by pinprick in the extremes 
of the L4, L5, and SI dermatomes (the medial aspect, dor¬ 
sum, and lateral aspect of the feet) (Figure 7-1). 

Higher lumbar nerve roots account for only about 2% of 
lumbar disk herniations. They are suspected when numb¬ 
ness or pain involves the anterior thigh more prominently 



than the calf (Figure 7-1). Testing includes knee reflexes, 
quadriceps strength, and psoas strength. 17,47,5 ° Quadriceps 
weakness is virtually always associated with impairment in 
the patella reflex. 50 
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The accuracy of neurologic findings for the diagnosis of a 
herniated disk is only moderate (Table 7-3). Considering 
combinations is helpful, however, because a finding of 
impaired ankle reflexes or weak foot dorsiflexion would have 
a sensitivity of almost 90% for patients with surgically 
proven disk herniations. 17 Multiple findings related to SLR or 
neurologic examination increase the probability that a herni¬ 
ated disk will be found at surgery. 52 

Spinal Stenosis 

The mean age of patients at surgery for spinal stenosis is 55 
years, with an average symptom duration of 4 years. 18 The 
characteristic history is that of neurogenic claudication: pain 
in the legs and occasionally neurologic deficits that occur 
after walking. In contrast to arterial ischemic claudication, 
neurogenic claudication is more likely to occur on standing 
alone (without ambulation), may increase with cough or 
sneeze, and is associated with normal arterial pulses. 53 The 
sensitivity of neurogenic claudication is modest (about 
0.60), 18 but it is probably quite specific. 

Few data are available concerning the accuracy of physical 
examination because stenosis has been widely recognized 
only in recent years. Diagnostic criteria, indications for sur¬ 
gery, and the natural history are still being elucidated. 
Increased pain on spine extension is typical of stenosis 
(whereas flexion is usually most painful with herniated 
disks), but accuracy data are unavailable. The sensitivity of 
leg pain is about 85%; neurologic abnormalities, about 60%; 
and abnormal SLR, about 50%. 18,53 

Cauda Equina Syndrome 

A massive midline disk herniation may cause spinal cord or 
cauda equina compression, requiring immediate surgical refer¬ 
ral. Fortunately, the cauda equina syndrome occurs in only 1% 
to 2% of all patients with lumbar disk herniations who come to 
surgery, 17 so its prevalence among all patients with low back 
pain is about 0.0004. The most consistent finding is urinary 
retention, with a sensitivity of 0.90. 54 ' 56 The most common sen¬ 
sory deficit occurs over the buttocks, posterior-superior thighs, 
and perineal regions (“saddle anesthesia”), with a sensitivity of 
about 0.75. 54 ' 56 Anal sphincter tone is diminished in 60% to 
80% of cases. 54-56 Assuming a specificity of about 95%, the pre¬ 
dictive value of a negative test result (no urinary retention) 
would be almost 100%. Unilateral or bilateral sciatica, sensory 
and motor deficits, and abnormal SLR results are all common, 
with sensitivities of greater than 0.80. 54 ' 56 

Indications for Imaging Tests 

There is a growing consensus that radiographs are not neces¬ 
sary for every patient with low back pain because of a low 
yield of useful findings, potentially misleading results, sub¬ 
stantial gonadal irradiation, and common interpretive dis¬ 
agreements. The Quebec Task Force on Spinal Disorders 
suggested that early radiography was necessary only in the 
face of neurologic deficits, age older than 50 years or younger 
than 20 years, fever, trauma, or signs of neoplasm. 57 Table 7-1 


indicates screening questions that can exclude neoplasm 
according to patient medical history alone. 10 

MRI and CT can be used even more selectively, usually for 
surgical planning. The finding of herniated disks and spinal 
stenosis in many asymptomatic persons 6,7 indicates that 
imaging results alone can be misleading, and valid decision 
making requires correlation with the medical history and 
physical examination. 58 

IS THERE EVIDENCE OF SOCIAL OR PSYCHOLOGICAL 
DISTRESS THAT MAY AMPLIFY OR PROLONG PAIN? 

Some features of patient medical history influence manage¬ 
ment regardless of the exact spinal pathology. Chronic pain 
or depression may be indications for the use of antidepres¬ 
sant medication rather than opiates. Alcohol or drug abuse 
influences the choice of medications and requires specific 
intervention. Disability compensation claims or litigation 
may affect initial evaluation and prognosis, and patients 
seeking compensation often respond poorly to a variety of 
treatments. 59 

Patients with chronic low back pain (> 3 months) present 
complex problems, and often, a pathoanatomic cause is not 
apparent. 60 Unlike acute pain, chronic pain is often not asso¬ 
ciated with ongoing tissue injury, serves no biological useful¬ 
ness, and is not accompanied by the autonomic response of 
sympathetic overactivity. Vegetative signs, such as sleep dis¬ 
turbance, appetite disturbance, and irritability, appear, and 
pain is often reinforced or perpetuated by social and psycho¬ 
logical factors. Back pain can affect employment, income, 
family, and social roles, producing psychological distress. 60,61 
Resulting somatic amplification can serve the patient’s needs 
for economic survival and maintenance of self-esteem. 61 

In patients with chronic low back pain, the absence of sys¬ 
temic disease and treatable anatomic abnormalities should 
be confirmed by medical history, physical examination, and 
review of diagnostic tests. Neurologic abnormalities often 
prove to be longstanding and may persist after surgical inter¬ 
ventions. Evidence of psychological distress should be sought 
because this may respond to direct intervention and improve 
the likelihood of response to other treatments. The Minne¬ 
sota Multiphasic Personality Inventory is impractical in most 
primary care settings, and shorter depression scales are use¬ 
ful for screening. 62,63 

Waddell et al 64 proposed 5 categories of inappropriate or non- 
organic signs that correlated with other indicators of psycholog¬ 
ical distress: (1) inappropriate tenderness that is superficial or 
widespread; (2) pain on simulated axial loading by pressing on 
the top of the head, or simulated spine rotation (performed by 
holding the patient’s arms to the side while rotating the hips, 
ensuring that the shoulders and hips rotate together); (3) “dis¬ 
traction” signs, such as inconsistent performance between SLR 
in tlie seated position vs the supine position; (4) regional distur¬ 
bances in strength and sensation that do not correspond with 
nerve root innervation patterns; and (5) overreaction during the 
physical examination. The occurrence of any 1 sign was of lim¬ 
ited value, but positive findings in 3 of the 5 categories suggested 
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psychological distress. The precision of nonorganic signs was 

reported by Waddell et al 64 to be high, but subsequent evaluation 

found poor precision in the regional disturbance category 

(Table 7-2). 23 

SUMMARY AND RECOMMENDATIONS 

History 

1. A few key questions can raise or lower the probability of 
underlying systemic disease. The most useful items are age, 
history of cancer, unexplained weight loss, duration of pain, 
and responsiveness to previous therapy. 

2. Intravenous drug use or urinary infection raises the suspi¬ 
cion of spinal infection. 

3. Ankylosing spondylitis is suggested by the patient’s age 
and sex (most common in young men), but most clinical 
findings have limited accuracy. 

4. Failure of bed rest to relieve the pain is a sensitive finding 
for all these systemic conditions, although not specific. 

5. Neurologic involvement is suggested by symptoms of sci¬ 
atica or pseudoclaudication. Pain radiating distally (below 
the knee) is more likely to represent a true radiculopathy 
than pain radiating only to the posterior thigh. A history 
of numbness or weakness in the legs further increases the 
likelihood of neurologic involvement. 

6. Inquiry should be made concerning symptoms of the 
cauda equina syndrome: bladder dysfunction (especially 
urinary retention) and saddle anesthesia in addition to 
sciatica and weakness. 

7. The psychosocial history helps to estimate prognosis and 
plan therapy. The most useful items are a history of failed 
treatments, substance abuse, and disability compensation. 
Brief screening questionnaires for depression may suggest 
important therapeutic opportunities. 

Physical Examination 

1. Fever suggests the possibility of spinal infection. Vertebral 
tenderness is a sensitive finding for infection but not specific. 

2. The search for soft-tissue tenderness is unlikely to provide 
reproducible data or demonstrably valid pathophysiologic 
inferences. 23,39 

3. Limited lumbar flexion is not highly sensitive or specific 
for ankylosing spondylitis or other diagnoses. However, 
limited spinal motion may be useful in planning physical 
therapy and monitoring response. 

4. In a patient with sciatica or possible neurogenic claudica¬ 
tion, SLR should be assessed bilaterally, preferably with an 
inclinometer or goniometer. 

5. Neurologic examination emphasizes ankle dorsiflexion 
strength, great-toe dorsiflexion strength, ankle reflexes, 
and the sensory examination. A rapid screening sensory 
examination would test pinprick sensation in the medial, 
dorsal, and lateral aspects of the foot. 

6. For the patient with chronic pain, all the evaluations 
described herein should be completed. Anatomically “inap¬ 
propriate” signs may be helpful in identifying psychological 


distress as a result of or as an amplifier of low back symp¬ 
toms. The most reproducible of these signs are superficial 
tenderness, distracted SLR, and the observation of patient 
overreaction during the physical examination. 
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UPDATE: Low Back Pain 



CLINICAL SCENARIO 


A physically active 61-year-old man presents with com¬ 
plaints of low back pain and occasional pain in his left but¬ 
tock and upper thigh. His symptoms began approximately 
3 weeks ago. In addition, increasing pain in his lower 
extremities is preventing him from participating in his hob¬ 
bies and socializing with his usual group of friends. He has 
no history of weight loss and no changes in bowel or blad¬ 
der habits. During the physical examination, the patient 
reports thigh and back pain at 50 degrees during the 
straight leg raise (SLR) test on the left but no radiation 
below the knee. He has slight pain in the back of his right 
leg with SLR testing to 75 degrees. When you test his quad¬ 
riceps strength, his left side seems a little weaker than the 
right, but the testing is limited by his discomfort. His single¬ 
leg sit-to-stand test result is normal. The ankle reflexes are 
absent bilaterally. Given the results of this brief history and 
physical examination, are there other maneuvers you could 
perform? What diagnosis can you provide for this patient? 

UPDATED SUMMARY ON LOW BACK PAIN 

Original Review 

Deyo RA, Rainville J, Kent DL. What can the history and 
physical examination tell us about low back pain? JAMA. 
1992;268(6):760-765. 

UPDATED LITERATURE SEARCH 

We initially sought articles including the keywords “back pain,” 
“herniated disk,” or “sciatica” and “specificity” using the “Clin¬ 
ical Query” mechanism in MEDLINE. In addition, we filtered 
for human, English-language articles, resulting in 190 citations 
from 1992 to August 2004. We performed an additional search 
including the various forms and combinations of the terms 
“intervertebral disk displacement,” “characteristic,” “feature,” 
“finding,” “marker,” “predictor,” “sign,” “test,” “variable,” “phys¬ 
ical,” “exam,” and “sensitivity.” This search added 28 unique 
citations to our article pool. We also searched bibliographies 
and personal files for additional articles. Two reviewers inde¬ 
pendently examined the abstracts of the articles retrieved by 
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this search. Articles included in the update were selected by 
consensus between the 2 reviewers; 6 articles were deemed rel¬ 
evant to this update. 

We included 2 systematic reviews 1,2 and 3 prospective 
studies 3 ' 5 that focused on the physical examination of individ¬ 
uals with low back pain. We excluded 1 literature synthesis 
when we could not replicate the data on our review of the 
original references. 6 We considered articles related to detect¬ 
ing lumbar radiculopathy or underlying systemic diseases 
among patients with low back pain. We excluded neck pain 
or spinal stenosis from our review. 

NEW FINDINGS 

• More than 90% of normal individuals younger than 60 years 
have bilateral ankle reflexes, but 5% have 1 absent ankle 
reflex and 5% have no ankle reflexes. Among those older 
than 60 years, only 60% have both ankle reflexes, 30% have 
no ankle reflexes, and 10% have an absent ankle reflex in 1 
lower extremity. These age-related deficits reduce the speci¬ 
ficity of a diminished ankle reflex as a test for L5 to SI radic¬ 
ulopathy in older patients. 

• When a patient with low back pain is screened for cancer, it 
may be prudent to inquire about pain at night. Night pain is 
sensitive among patients with cancer as a cause of back pain, 
but not specific: sensitivity, 0.92; specificity, 0.46; positive like¬ 
lihood ratio (LR+), 1.7 (95% confidence interval [Cl], 1.2- 
1.9); and negative likelihood ratio (LR-), 0.17 (95% Cl, 0.03- 
0.73). Thus, the absence of night pain is helpful in reducing 
the probability of cancer, but its presence is minimally helpful. 
The absence of night pain should be interpreted together with 
the absence of other important findings to identify patients at 
low risk of back pain secondary to malignancy. 

• The single-leg sit-to-stand test (described below) may be 
the most reliable method for detecting quadriceps weak¬ 
ness (k = 0.85) which suggests upper lumbar (L3-L4) radic¬ 
ulopathy in patients with low back pain. 6 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

No new findings substantially changed the results of those origi¬ 
nally reported in the Rational Clinical Examination series. 
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Table 7-4 Estimated Accuracy of Ipsilateral Straight-leg Raise Test for 
Lumbar Disk Herniation 


Source, Patient Population 

LR+ (95% Cl) 
or Range 

LR- (95% Cl) 
or Range 

Jonsson and Stromqvist, 4 surgical 
series 3 (n = 300 patients) 

2.0(1.7-2.4) 

0.21 (0.12-0.36) 

van den Hoogen et al, 2 surgical 
series (n = 7 studies) 

0.99-1.8 

0.04-0.54 

Deville et al, 1 surgical series 
(n = 10 studies) 

1.1 (1.0-1.1) 

0.34 (0.28-0.40) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Patients with herniated disk were compared with patients with lateral or central stenosis. 

Table 7-5 Estimated Accuracy of Crossed Straight-leg Raise Test for 
Lumbar Disk Herniation 

LR+ (95% Cl, when 

Source, Patient Population data available) 

LR- (95% Cl, when 
data available) 

Jonsson and Stromqvist, 4 surgi¬ 
cal series 3 (n = 300 patients) 

5.8(2.7-12) 

0.80 (0.72-0.90) 

van den Hoogen et al, 2 surgi¬ 
cal series 6 (n = 6 studies) 

1.6-8.8 

0.59-90.0 

Deville et al, 1 surgical series 
(n = 6 studies) 

2.2(1.8-2.8) 

0.81 (0.77-0.87) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Patients with herniated disk were compared to patients with lateral or central stenosis. 
“Various literature estimates were not pooled in this study. 


Table 7-6 Estimated Accuracy of Sit-to-Stand Test for Upper Lumbar 
Disk (L3 to L4) Herniation With Radiculopathy 

Source LR+ (95% Cl) LR- (95% Cl) 

Rainville etal, 5 nonsurgical series 3 26(1.7-413) 0.35(0.22-0.56) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Patients with L3 to L4 radiculopathy were compared with patients with lower lumbar 
radiculopathy (L5-S1). 


Table 7-7 Presence of Achilles Tendon Reflex in Patients Without a 
History of Low Back Pain, Sciatica, or Systemic Disease 3 



Total 

Both Present, 

Both Absent, % 

One Absent, % 

Age, y 

Patients 

% (95% Cl) 

(95% Cl) 

(95% Cl) 

16-20 

38 

100 (92-100) 

0 (0-8) 

0 (0-8) 

21-30 

133 

100 (98-100) 

0 (0-2) 

0 (0-2) 

31-40 

112 

96 (93-100) 

0.9 (0.8-3) 

3 (0-6) 

41-50 

140 

95 (90-98) 

2.9 (0.1-6) 

3 (0-6) 

51-60 

162 

88 (83-93) 

4(1-6) 

8(4-12) 

61-70 

187 

63 (56-70) 

7(3-10) 

30 (23-60) 

71-80 

186 

54(47-61) 

10(5-14) 

37 (30-43) 

81-90 

99 

40 (31-50) 

10(4-16) 

50 (40-59) 

91-100 

17 

18(0-36) 

6(0-17) 

77 (66-87) 


Abbreviation: Cl, confidence interval. 

“Frequency may not total 100% because of rounding. 


CHANGES IN THE REFERENCE STANDARD 

The reference standard for a herniated disk causing radicu¬ 
lopathy continues to be surgical findings or the combina¬ 
tion of clinical findings, imaging results, electrophysiology, 
and clinical course. No major new diagnostic techniques 
have been introduced. However, as suggested in the synthe¬ 
sis of literature of SLR, the choice of reference standard 
(imaging vs surgical findings) may influence estimates of 
test performance. 

RESULTS OF LITERATURE REVIEW 

Univariate Results of Tests for Herniated Lumbar Disk 

The methods used in studying low back pain continue to be 
poor, leading to ambiguous results. As indicated in previous 
reviews, a clinical diagnosis is generally reached from multiple 
items of medical history and physical examination, with no 
single test sensitive and specific enough to make a definitive 
diagnosis. 

Since the original Rational Clinical Examination article on 
back pain, 1 systematic review and a new surgical series have 
addressed the sensitivity and specificity of the SLR and 
crossed straight leg raise (CSLR) tests. These studies result in 
estimates close to those cited in the original Rational Clinical 
Examination article ( bles 7- and 7-5). The review article 
suggests a somewhat higher sensitivity of the SLR test (close 
to 0.90 rather than 0.80), whereas the surgical series reported 
somewhat greater specificity for the CSLR test (0.96 vs 0.90). 

The sit-to-stand test is the most reliable test (k = 0.85) 
for detecting quadriceps weakness, and it may discriminate 
those with an L3 to L4 herniation from those with an L5 to 
SI lesion. 6 To perform the single-leg sit-to-stand test, the 
patient attempts to rise from a chair by using only 1 leg. 
The patient is allowed to place his or her hand in the exam¬ 
iner’s for aid with balance, and a negative finding/normal 
result is recorded if the patient is able to rise successfully 
(LR+, 26 [95% Cl, 1.7-413]; LR-, 0.35 [95% Cl, 0.22-0.56]) 
( able 7- ). With regard to reflexes, a large study assessed 
whether absent Achilles reflexes occur in seemingly normal 
older patients ( ble 7-7). The absence of an ankle reflex 
becomes increasingly common in individuals older than 60 
years, suggesting that this finding is most meaningful at 
younger ages. 

EVIDENCE FROM GUIDELINES 

The 1994 guidelines on acute low back problems in adults 
prepared by the Agency for Health Care Policy and Research 
(now the US Agency for Healthcare Research and Quality) 
largely reiterated data from the original Rational Clinical 
Examination article. Guidelines on back pain from New 
Zealand, Australia, and Holland (published in 1995, 1996, 
and 2003, respectively) have no discussion on accuracy of 
the medical history and physical examination but recom¬ 
mend clinical evaluation consistent with the evaluation 
proposed here. 
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CLINICAL SCENARIO—RESOLUTION 


Although this patient has some symptoms with SLR and has 
absent ankle reflexes, it is unlikely that he has neurologic def¬ 
icits related to his low back pain. Some 30% of patients this 
age (>60 years) have absent ankle reflexes in the absence of 
low back pathology. The absence of pain radiating below the 
knee with SLR suggests that this patient’s pain is most likely 
not the result of a lumbar radiculopathy. The absence of a 
positive CSLR result reinforces this impression. It is some¬ 
times difficult to decide whether a patient is truly weak or 
whether strength testing effort is reduced by pain. However, 
this patient’s normal sit-to-stand test result confirms normal 
strength. The combination of findings suggests that he does 
not have a herniated disc, so ordering additional tests (eg, 
electromyogram, nerve conduction, magnetic resonance 
imaging [MRI]) is not necessary. 

See next page for the “Make the Diagnosis” section. 


REFERENCES FOR THE UPDATE 

1. Deville WLJM, van der Windt DAWM, Dzaferagic A, Bezemer PD, 
Bouter LM. The test of Lasegue: systematic review of the accuracy in 
diagnosing herniated discs. Spine. 2000;25(9):1140-1147.® 

2. van den Hoogen HM, Koes BW, van Eijk JT, Bouter LM. On the accuracy 
of history, physical examination, and erythrocyte sedimentation rate in 
diagnosing low back pain in general practice: a criteria-based review of 
the literature. Spine. 1995;20(3):318-327. a 

3. Bowditch MG, Sanderson P, Livesey JP. The significance of an absent 
ankle reflex. / Bone Joint Surg Br. 1996;78(2);276-279. a 

4. Jonsson B, Stromqvist B. Symptoms and signs in degeneration of the 
lumbar spine: a prospective, consecutive study of 300 operated patients. 
J Bone Joint Surg Br. 1993;75(3):381-385. a 

5. Rainville J, Jouve C, Finno M, Limke J. Comparison of four tests of quadri¬ 
ceps strength in L3 or L4 radiculopathies. Spine. 2003;28(21):2466-2471. a 

6. Vroomen PC, de Krom MCTFM, Knotterus JA. Diagnostic value of his¬ 
tory and physical examination in patients suspected of sciatica due to 
disc herniation: a systematic review. J Neurol. 1999;246(10):899-906. 


a For the Evidence to Support the Update on this topic, 
see http://www.JAMAevidence.com. 







85 
















CHAPTER 7 Update 


LOW BACK PAIN— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Because of the weak associations among symptoms, physical 
findings, imaging results, and electromyograms, a majority of 
patients with low back pain (= 85%) cannot be given a defini¬ 
tive diagnosis. Among asymptomatic individuals, 20% to 30% 
have evidence of a herniated disk on computed tomography 
(CT) or MRI. However, only small portions (2%) of individu¬ 
als with low back pain eventually undergo surgery for disk her¬ 
niation. Thus, the prevalence of clinically important disk 
herniations is low. 

In the primary care setting, the prevalence of compression 
fracture and spondylolisthesis is small, at 4% and 3%, respec¬ 
tively, in patients with low back pain. Fortunately, low back 
pain as a result of spinal malignancy, ankylosing spondylitis, or 
spinal infection is rare. The prevalence of these conditions 
among patients with back pain is approximately 0.7%, 0.3%, 
and 0.01%, respectively. 

POPULATION FOR WHOM HERNIATED DISK WITH 
RADICULOPATHY SHOULD BE CONSIDERED 

A herniated disk with radiculopathy should be considered in 
any adult with back and leg pain. Herniated disks causing sciat¬ 
ica are most common in middle-aged adults (30-55 years) and 
are somewhat less common in older adults (Table 7-8). 


Table 7-8 Utility of the Clinical Examination for Herniated Disk or 
Cancer Among Patients With Back Pain 

LR+ (95% Cl) or Range LR- (95% Cl) or Range 

Sit-to-stand test for 
upper lumbar herniation 

26 (1.7-413) 

0.35 (0.22-0.56) 

Nocturnal pain for 
cancer-induced 
back pain 

17(1.2-1.9) 

0.17(0.03-0.73) 

Crossed straight-leg 
raise for disk herniation 

1.6-5.8 

0.59-0.90 

Ipsilateral straight-leg 
raise for disk herniation 

0.99-2.0 

0.04-0.50 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


POPULATION FOR WHOM CANCER 
SHOULD BE CONSIDERED 

Although it accounts for less than 1% of patients with back 
pain, cancer is the most common of systemic causes. Cancer 
should be considered as a possible cause of low back pain in 
patients older than 50 years with low back pain and in 
patients with a history of cancer (especially prostate, lung, or 
breast carcinoma). In addition, patients who fail to improve 
after 4 to 6 weeks of conservative therapy should be evaluated 
for underlying systemic diseases such as cancer (Table 7-8). 

REFERENCE STANDARD TESTS 

For herniated disks, surgical findings may be a gold standard 
for diagnosis, but back surgery should never be considered 
just to confirm the absence of a disk hernia among patients 
with a negative clinical and imaging examination result. For 
patients who do not undergo surgery, CT or MRI demon¬ 
strating a disk herniation with nerve root impingement 
might be considered a gold standard. In addition, elec¬ 
tromyography may confirm nerve root involvement. How¬ 
ever, clinicians must realize that herniated disks on imaging 
are common among asymptomatic individuals. Thus, the 
imaging findings must be carefully correlated with clinical 
history, physical examination, and the time course of illness. 

For metastatic cancer or infection, biopsy will be the usual 
gold standard, but these are performed only in patients with 
suggestive clinical and imaging findings. Imaging and labo¬ 
ratory test results (such as the erythrocyte sedimentation 
rate), if negative, are usually sufficient to rule out cancer and 
infection as a cause of back pain. For compression fractures, 
the gold standard remains imaging. 














EVIDENCE TO SUPPORT THE UPDATE 


Low Back Pain 



TITLE The Significance of an Absent Ankle Reflex. 

AUTHORS Bowditch MG, Sanderson P, Livesey JP. 

CITATION /Bone Joint Surg [Br]. 1996;78(2):276-279. 

QUESTION What is the prevalence of abnormal ankle 
reflexes in adults without pathologic causes of reflex loss? 

DESIGN Prospective. 

SETTING Orthopedic outpatient department in 2 hospitals. 

PATIENTS A total of 1074 patients (541 men, 533 
women), aged 16 to 99 years, without history of spinal dis¬ 
ease, low back pain, sciatica, diabetes mellitus, or neuro¬ 
pathic or systemic medical disease. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients were examined in 3 positions: sitting with legs hang¬ 
ing over edge of seat, kneeling with feet over edge, and lying 
in supine and lateral positions. A reflex was considered 
present if it was elicited in any of the positions and absent if it 
was not. To determine interexaminer reliability, 50 patients 
were examined separately by each of the 3 authors (k = 0.94). 

MAIN OUTCOME MEASURES 

The presence or absence of either 1 or both ankle reflexes was 
noted, and data were displayed according to age range in 
increments of 10 years. The prevalence and its 95% confi¬ 
dence interval for age group were calculated. The authors also 
tested for a relationship between prevalence and age with the 
X 2 test. Finally, the results of each pair of consecutive groups 
were compared to determine at what age the largest changes 
in prevalence occurred. 

MAIN RESULTS 

See fables 7-9, 7-10, and 7-11. 


Table 7-9 Presence of Achilles Tendon Reflex in Patients Without a 
History of Low Back Pain, Sciatica, or Systemic Disease 3 


Age, y 

Total 

Patients 

Both 

Present, % 
(95% Cl) 

One Absent, 
% (95% Cl) 

Both Absent, 

% (95% Cl) 

16-20 

38 

100(92-100) 

0 (0-8) 

0 (0-8) 

21-30 

133 

100(98-100) 

0 (0-2) 

0 (0-2) 

31-40 

112 

96(93-100) 

0.9 (0.8-3) 

3 (0-6) 

41-50 

140 

95 (90-98) 

2.9 (0.1-6) 

3 (0-6) 

51-60 

162 

88 (83-93) 

4(1-6) 

8(4-12) 

61-70 

187 

63 (56-70) 

7(3-10) 

30 (23-60) 

71-80 

186 

54(47-61) 

10(5-14) 

37 (30-43) 

81-90 

99 

40(31-50) 

10(4-16) 

50 (40-59) 

91-100 

17 

18(0-36) 

6(0-17) 

77 (66-87) 


Abbreviation: Cl, confidence interval. 

“Frequency may not total to 100% because of rounding. 


Table 7-10 Significant Changes in Prevalence Between Consecutive 
Age Groups 

Consecutive Age Groups, y P Value for Both Ankle Reflexes Absent 

51-60 vs 61-70 <.001 

71-80 vs 81-90 .04 


Table 7-11 A “Working” Guide 

Age, y Both Present, % Both Absent, % One Absent, % 

<60 >90 <5 <5 

>60 60 30 10 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large number of participants in a prospective 
study with high interrater reliability. 
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LIMITATIONS Examiners were not blinded to the patient’s 
age, although it is hard to do a practical study in which the 
examiners would have no idea of the patient’s age. 

The prevalence of ankle reflexes decreases with age. The 
largest decrements occur when comparing individuals in 
their 50s with those in their 60s and individuals in their 70s 
with those in their 80s. When using ankle reflexes to examine 
a patient for lumbar radiculopathy, the absence of an ankle 
reflex will be more meaningful in a patient younger than 60 
years. Unilateral ankle reflex loss is far less common and is 
thus a more meaningful clinical sign, especially in younger 
patients. 

Reviewed by Ben Stern, MS, DPT 


TITLE The Test of Lasegue: Systematic Review of the 
Accuracy in Diagnosing Herniated Disks. 

AUTHORS Deville WLJM, van der Windt DAWM, 
Dzaferagic A, Bezemer PD, Bouter LM. 

CITATION Spine. 2000;25(9): 1140-1147. 

QUESTION How accurate are the straight leg raise 
(SLR) and cross straight leg raise (CSLR) tests at diagnos¬ 
ing a herniated disk in patients with low back pain? 

DATA SOURCES 

A MEDLINE and EMBASE search from an earlier review was 
extended to include 1992 through 1997 (keywords: “radicu¬ 
lopathy,” “backache,” “low back,” “Lasegue,” “straight leg rais¬ 
ing,” and “cross straight leg raising”). Bibliographies of 
retrieved studies were also reviewed for relevant material. 


STUDY SELECTION 

In total, 552 studies were retrieved; 15 met the inclusion cri¬ 
teria. Studies were selected if they used surgery as the refer¬ 
ence standard, presented data on sensitivity or specificity, 
and included more than 10 patients with disease. Review 
articles were not included. The authors’ original review 
(1995) included 19 studies, 12 of which were included in this 
review. The extended search through 1997 yielded 12 addi¬ 
tional studies, of which 3 were retained for use. 

DATA EXTRACTION 

Two reviewers independently rated each study in 16 catego¬ 
ries, including criteria related to internal and external valid¬ 
ity (reference and index application and quality, spectrum of 
patients, setting, reproducibility, etc). The maximum possi¬ 
ble score was 17, with 6 points on internal validity and 11 on 


external. In addition, information on disease prevalence at 
the setting was collected. 

MAIN RESULTS 

Of the 15 studies included in this review, 7 included patients 
with previous disk surgery and 2 included patients with 
bilateral radiculopathy, both of whom had previous disk sur¬ 
gery. None of the studies occurred in a primary care setting. 
Positive SLR cutoff point was mentioned in 6 of the studies 
and ranged from less than 70 degrees (n = 3) to less than 90 
degrees (n = 2). The addition of neck flexion or foot dorsi- 
flexion was not evaluated. The median internal validity 
scores were 50% (range, 33%-66%) and 45% (range, 18%- 
72%), respectively. Median total validity score was 47% 
(range, 29%-65%), with 6 studies scoring 50% or better. 

The authors included studies that were “sensitivity-only 
studies” (ie, only diseased patients), along with studies of 
diagnostic accuracy (patients with and without disk hernia¬ 
tion). The pooled sensitivity of SLR was 0.91 (95% confi¬ 
dence interval [Cl], 0.82-0.94) and pooled specificity was 
0.26 (95% Cl, 0.16-0.38). For the CSLR, pooled sensitivity 
was 0.29 (95% Cl, 0.24-0.34) and specificity was 0.88 (95% 
Cl, 0.86-0.90). See le 7-12. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS Appropriate study question, literature search, 
and evaluation for bias. 

LIMITATIONS The authors included sensitivity-only studies 
in their pooled estimates. However, they provide the data for 
all the studies that allow us to calculate the pooled likelihood 
ratios. 

These data suggest that the SLR and CSLR should be used in 
combination. Although they are similar in overall accuracy (as 
evidenced by similar diagnostic odds ratio), the SLR primarily 
has value when it is absent (lowering the likelihood of a disk 
herniation), whereas the CSLR primarily has value when it is 
present (increasing the likelihood of a disk herniation). 

Because all the studies included were surgical case series 
taken from hospitals and not from primary care facilities, an 
unusually high prevalence existed in these studies (86% for the 


Table 7-12 SLR and CSLR as a Test for Disk Herniation 3 

Test (n = No. of Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

Straight leg raise (n = 10) 

1.1 (1.0-1.1) 

0.34 (0.28-0.40) 

Crossed straight leg raise (n = 6) 

2.2(1.8-2.8) 

0.81 (0.77-0.87) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

a We calculated the pooled likelihood ratio with random-effects measures. We used 
only studies that had sensitivity and specificity data. We excluded the outlier study 
noted by the authors. 
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SLR studies and 92% for the CSLR studies). The diagnostic 
odds ratio of the SLR decreased with designs of higher validity, 
homogeneity of case mix, and exclusion of patients with his¬ 
tory of disk surgery. Both findings need better validation in 
populations of patients with a lower prevalence of disk hernia¬ 
tion, such as those treated in primary care settings. 

Reviewed by Ben Stern, MS, DPT 


TITLE Symptoms and Signs in Degeneration of the 
Lumbar Spine. 

AUTHORS Jonsson B, Stromqvist B. 

CITATION JBone Joint Surg [Br]. 1993;75(3):381-385. 

QUESTION What are the frequencies of symptoms and 
neurologic disturbances among patients with spinal ste¬ 
nosis and lumbar disk herniation? 

DESIGN Prospective study of patients consecutively 
admitted for lumbar spine surgery. 

SETTING Inpatient surgery. 

PATIENTS Three hundred patients admitted for lum¬ 
bar disk or lumbar decompression surgery (100 disk her¬ 
niation, 100 lateral stenosis, and 100 central stenosis). 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Diagnosis was established with myelography, computed 
tomography, or magnetic resonance imaging and occasion¬ 
ally supplemented by nerve root block. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity for a herniated lumbar disc: the 
likelihood ratios (LRs) represent the likelihood of a herniated 
disk (as opposed to central stenosis). When a finding is 
abnormal, the associated positive LR (LR+) of more than 1.0 
makes a herniated disk more likely, whereas an LR+ of less 
than 1.0 makes central stenosis more likely. When a finding is 
normal, a negative LR (LR-) of more than 1.0 increases the 
likelihood of a disk herniation, whereas an LR- of less than 
1.0 increases the likelihood of central stenosis. 


MAIN RESULTS 

See Table 13. 


Table 7-13 Likelihood of Disk Herniation Versus Central Stenosis 


Disk Herniation 


LR+ (95% Cl) a 

LR- (95% Cl) b 

SLR 

2.0(17-2.4) 

0.21 (0.12-0.36) 

CSLR 

5.8 (2.7-12) 

0.80 (0.72-0.90) 

Patellar reflex 

0.40 (0.22-0.73) 

1.2 (1.1-1.4) 

Ankle reflex 

0.96(0.75-1.2) 

1.0(0.82-1.30) 

Sensory disturbance 

1.3 (1.1-1.6) 

0.70 (0.52-0.95) 

No relief with rest 

1.1 (1.0-1.3) 

0.59 (0.35-1.0) 


Abbreviations: Cl, confidence interval; CSLR, crossed straight leg raise; LR+, posi¬ 
tive likelihood ratio; LR-, negative likelihood ratio; SLR, straight leg raise. 
a LR+ greater than 1 favors disk herniation, whereas LR+ less than 1 favors central 
stenosis. 

6 LR- greater than 1 favors disk herniation, whereas LR- less than 1 favors central 
stenosis. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Differential diagnostic test evaluated among 
patients with known disease status (herniated disk vs central 
spinal stenosis). 

LIMITATIONS The clinicians knew that all the patients had 
lesions. 

Patients without evidence of spinal pathology were not 
included in this study, so it is difficult to know whether the 
results generalize to patients who have not yet had an imag¬ 
ing study or surgery. However, the data suggest that a positive 
response to CSLR increases the likelihood of disk herniation 
rather than central stenosis. A normal conventional SLR 
response favored central stenosis over disk herniation, 
whereas abnormal patellar reflexes decreased the likelihood 
of disk herniation (perhaps a counterintuitive finding). 
Given the limitation imposed by the study population, in 
which all patients had either lumbar stenosis or central ste¬ 
nosis, the other clinical results had limited or no ability to 
distinguish between the 2 diagnoses. 

Reviewed by Ben Stern, MS, DPT 


E7-3 















CHAPTER 7 Evidence to Support the Update 


TITLE Comparison of Four Tests of Quadriceps Strength 
in L3 or L4 Radiculopathies. 

AUTHORS Rainville J, Jouve C, Finno M, Limke J. 

CITATION Spine. 2003;28(21):2466-2471. 

QUESTION In adults with demonstrable L3 or L4 nerve 
root compression via computed tomography (CT) or 
magnetic resonance imaging (MRI), which of 4 tests of 
quadriceps strength best reflects evidence of a lesion at L3 
or L4 vs L5 to SI? In other words, which tests best distin¬ 
guish an upper lumbar radiculopathy from the far more 
common lower lumbar radiculopathy? 

DESIGN Prospective, nonconsecutive patients with 
uncertainty in the independence of the clinical findings. 

SETTING Outpatient physician office. 

PATIENTS One group of participants recruited from a 
hospital spine center if they had lumbar radiculopathy 
and radiographically demonstrated evidence of displaced 
or compressed L3 or L4 nerve root on symptomatic side 
(n = 33: L3, n = 10; L4, n = 23). Another group of patients 
with L5 or S1 nerve root compression evidence via CT or 
MRI was asked to participate in the study as a comparison 
group (n = 19: L5, n = 8, SI, n = 11). Patients with bilat¬ 
eral radiculopathy, neurologic or muscular disease affect¬ 
ing the lower extremity (LE), evidence of symptom 
magnification, LE arthritis, cancer, cognitive dysfunction, 
and nonambulatory status were excluded. Average age of 
participants was 53 years, with an average duration of 
symptoms of approximately 2.8 months. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The CT or MRI results served as the reference standard. 

In addition to a routine physical examination, 4 tests of 
quadriceps strength were performed on each patient: (1) sin¬ 
gle-leg sit-to-stand, (2) step-up test, (3) knee-flexed manual 
muscle testing, and (4) knee-extended manual muscle test. 
For each maneuver, quadriceps strength was graded as 
abnormal (a positive result suggesting an L3-L4 lesion). 


Patients with a normal result (negative likelihood ratio) were 
less likely to have an L3 to L4 lesion. 

To perform the single-leg sit-to-stand, the participant 
attempted to rise from a chair by using only 1 leg. The partic¬ 
ipant was allowed to place her or his hands in the examiner’s 
for aid with balance, and a score of normal was recorded if 
the participant was able to rise successfully. The step-up test 
was performed by asking the patient to step up on a 7-in 
stool (such as those built in to the end of an examining 
table). If the participant was able to step onto the stool suc¬ 
cessfully, a score of normal was recorded. The knee-flexed 
manual muscle test was performed in the supine position. 
The patient’s leg was held distally near the ankle while the hip 
was flexed to 90 degrees and the knee was flexed to end range. 
The participant was then asked to straighten the leg toward 
the end of the table. Ability to straighten the leg against max¬ 
imum resistance was recorded as normal. The knee-extended 
manual muscle test was also performed while the patient was 
supine. For this test, the examiner placed one hand above the 
participant’s distal ankle and the other forearm under the 
participant’s distal femur. The participant then straightened 
the knee, resulting in the heel’s rising off the table. After this, 
the examiner attempted to bend the knee and touch the heel 
to the table while the participant offered maximum resis¬ 
tance. Ability to maintain the knee in extension was recorded 
as normal. 

When available, a second examiner (blinded to the previ¬ 
ous results) performed the tests on the participants again (39 
of 53 participants). 

In addition, patients completed questionnaires including 
items related to quadriceps weakness. 

MAIN OUTCOME MEASURES 

Frequency of detection of L3 to L4 vs L5 to SI disk herniation 
as evidenced by imaging studies. The patients were evaluated 
for frequency of quadriceps weakness in L5 and SI radiculo¬ 
pathies. In addition, K values were used to determine inter¬ 
rater reliability of the 4 tests. 

MAIN RESULTS 

Thirty-three patients had an L3 to L4 lesion, whereas 19 had 
L5 to SI nerve compression (Table 7 4). 


Table 7-14 Quadriceps Strength as an Indicator of an L3 to L4 Lesion Among Patients With Nerve Root Compression 

Quadriceps Strength for Each Maneuver k (Interobserver Agreement) (%) LR+ for an L3-L4 Lesion (95% Cl) 

LR- (95% Cl) 

Sit to stand 

0.85 (92) 

26 (1.7-413) 

0.35 (0.22-0.56) 

Step up on stool 

0.83 (95) 

11 (0.69-182) 

0.74 (0.59-0.92) 

Manual muscle test, knee flexed 

0.66 (84) 

4.0(1.0-16) 

0.64 (0.46-0.90) 

Manual muscle test, knee straight 

0.08 (87) 

4.1 (0.22-76) 

0.92 (0.80-1.0) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
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CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Differential diagnostic test evaluated among 
patients with known disease status (L3 to L4 vs L5 to SI 
nerve root compression). An evaluation of the interobserver 
reliability was conducted. 

LIMITATIONS It is not clear whether the authors were 
blinded to the level of nerve root compression. They did 
know that all patients had lesions. Height of chair was not 
specified for sit-to-stand test. 

Controls without evidence of spinal pathology were not 
included in this study; thus, it is difficult to generalize to 
patients who have not yet had an imaging study or surgery. 
However, the excellent agreement among observers on 
watching the patient go from sit to stand or step up on a stool 
suggests that this may be a better way of evaluating quadri¬ 
ceps weakness than manual muscle testing. 

Reviewed by Ben Stern, MS, DPT 


TITLE On the Accuracy of History, Physical Examina¬ 
tion, and Erythrocyte Sedimentation Rate in Diagnosing 
Low Back Pain in General Practice. 

AUTHORS Van den Hoogen HM, Koes BW, van Eijk JT, 
Bouter LM. 

CITATION Spine. 1995;20(3):318-327. 

QUESTION How accurate are the medical history, phys¬ 
ical examination, and erythrocyte sedimentation rate 
(ESR) in diagnosing various causes of low back pain? 

DESIGN Systematic review. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

MEDLINE search was done (1986-1992) using the terms “back¬ 
ache,” or “low back,” and “sciatica,” “cancer,” or “spondylitis,” 
and bibliographies of retrieved studies were reviewed. 

STUDY SELECTION 

Studies were selected if they presented data on sensitivity or 
specificity of items in the medical history, physical examina¬ 
tion, and ESR for radiculopathy, vertebral cancer or metasta¬ 
sis, and ankylosing spondylitis. Review articles and studies 
including fewer than 10 patients were excluded; 540 studies 
were retrieved, and 36 were included in this review (19 radi¬ 
culopathy, 9 vertebral cancer, and 8 ankylosing spondylitis). 


DATA EXTRACTION 

Studies were independently rated for methodology by 2 
reviewers, with differences in rating resolved by consensus. 
Ratings for each study consisted of scores in categories for 
index and reference test quality, reference test application, 
independence, clinical description, study population, sample 
size, and data presentation. Sensitivity and specificity were 
calculated for each diagnostic test. 

MAIN OUTCOME MEASURES 

The mean total quality score for all studies was 55 of 100 
(range, 20-85). The lowest scores fell in the categories of ref¬ 
erence and index test quality, independence, clinical descrip¬ 
tion, and study population. Only studies with scores greater 
than 55 were reviewed for diagnostic accuracy. 

MAIN RESULTS 

The data presented in Tables 7 , 7-16, and 7-17 are the 

findings not reported in the original Rational Clinical Exam¬ 
ination article on low back pain. 1 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS Comprehensive review of articles with pre- 


defined selection 
quality. 

criteria and 

a method for 

assessing 

Table 7-15 Likelihood Ratios for Diagnosing Radiculopathy 

Finding 

(No. of Studies) 

Reference 

Standard 

LR+ (95% Cl) or 
Range 

LR- (95% Cl) 
or Range 

Sciatica (Knuttson 2 ; n = 
205; patients with low 
back pain [21 with no 
radiculopathy]) 

Operative findings 

0.92 (0.70-1.1) 

1.5 (0.5-4.4) 

Paresthesia (n = 2) 

Operative findings 

0.71-0.86 

1.2-1.4 

SLR (n = 7) 

Operative findings 

0.99-1.8 

0.04-0.54 

CSLR (n = 6) 

Operative findings 

1.6-8.8 

0.59-0.90 


Abbreviations: Cl, confidence interval; CSLR, crossed straight leg raise; LR+, posi¬ 
tive likelihood ratio; LR-, negative likelihood ratio; SLR, straight leg raise. 


Table 7-16 Likelihood Ratios for Diagnosing Vertebral Cancer 
Finding (No. of Studies) LR+ Range LR- Range 

Spinal tenderness (n = 3) 0.38-3.6 0.26-1.4 

Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
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Table 7-17 Likelihood Ratios for Diagnosing Ankylosing Spondylitis 




Source 

Finding 

Sensitivity (95% Cl) 

Specificity (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 

Gran 3 (n = 449); patients with low back 

Out of bed at night 

0.65(0.48-0.81) 

0.79 (0.75-0.83) 

3.2 (2.3-4.4) 

0.42 (0.25-0.72) 

pain (27 with ankylosing spondylitis) 

No relief lying down 

0.80 (0.63-0.92) 

0.49 (0.44-0.54) 

1.6(1.3-2.0) 

0.40(0.17-0.84) 


Pain duration > 3 mo 

0.71 (0.52-0.84) 

0.54 (0.49-0.59) 

1.5(1.2-2.0) 

0.55 (0.30-1.0) 


Age at onset < 35 y 

0.92 (0.77-0.98) 

0.30 (0.26-0.35) 

1.3 (1.2-1.5) 

0.25 (0.06-0.94) 


Morning stiffness 

0.63 (0.44-0.79) 

0.55(0.51-0.60) 

1.4 (1.0-1.9) 

0.67(0.41-1.1) 

Mau et al 4 (n = 54); suspected of having 
ankylosing spondylitis (32 positive) 

ESR raised 

0.69 

0.68 




Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 


LIMITATIONS Lack of specificity data of many included stud¬ 
ies. The majority (33/36) of the studies included only hospital- 
based patients, thus limiting ability to generalize results. 

None of the individual items in the medical history or physi¬ 
cal examination were sufficiendy useful in diagnosing ankylos¬ 
ing spondylitis, radiculopathy, or vertebral cancer. Rather than 
using single tests, clinicians must instead rely on the diagnostic 
value of a combination of the available clinical data. 

Reviewed by Ben Stern, MS, DPT 
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CASE 1 On annual examination of a 64-year-old woman, 
you observe an 8-mm mass in her right breast. She says 
she never noticed the mass before. Her screening mam¬ 
mogram result 7 months ago was normal. 

CASE 2 A 42-year-old woman comes to see you because 
she is upset. “I want a breast examination, doctor. My 
coworker was just diagnosed with breast cancer.” She prac¬ 
tices breast self-examination regularly. She has observed no 
changes in her breasts. 


WHY PERFORM A BREAST EXAMINATION? 


The clinical breast examination (CBE), like any part of the phys¬ 
ical examination, can be used either for screening (to detect 
breast cancer in asymptomatic women) or diagnosis (to evaluate 
breast complaints, primarily to rule out cancer). In primary 
care, screening CBEs are more commonly performed than diag¬ 
nostic CBEs. Of a total of 14 859 CBEs performed on a cohort of 
2400 women during a 10-year period, 73% were for screening 
and 27% were diagnostic 1 (Joann G. Elmore, MD, MPH, Har- 
borview Medical Center, Seattle, Washington, written commu¬ 
nication, November 1998). This review concentrates on the 
screening CBE because most research has been directed to 
screening rather than for diagnostic CBE. Because the screening 
CBE involves the search for cancer, there may be legal reasons, as 
well as medical reasons, for performing it well. Failure to diag¬ 
nose breast cancer is a leading reason for malpractice claims, 
and lawsuits against primary care clinicians account for half the 
indemnity payments made. 2 Clinicians who do not perform 
careful screening may be more liable. Also, some women are 
more willing to accept screening CBE than mammography, 3 in 
which case screening CBE is particularly important. 

Anatomic Basis of the Breast Examination 

The female breast consists of glandular and fibrous tissue and 
fat. Lobules of milk-producing glandular tissue radiate from 
the nipple, centrally supported by fibrous strands. Breast tis¬ 
sue, surrounded by superficial fascia, is attached to both the 
skin and the pectoral fascia by supporting ligaments. Fat sur¬ 
rounds the lobules of the breast, predominating in the super¬ 
ficial and peripheral portions. Breast tissue extends from the 
sternum medially to the midaxillary line laterally and from 
the clavicle superiorly to the “bra line” inferiorly, a rectangu¬ 
lar rather than a circular area. The normal breast does not 
have a homogeneous texture but usually is somewhat lumpy 
on palpation. 

Common distortions of the breast architecture include cysts, 
which are thought to arise from obstructed collecting ducts, 
and fibroadenomas, which are caused by an overgrowth of 
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periductal stromal connective tissue within the lobules of the 
breast. Other benign processes within the ductal system may 
cause a mass or nipple discharge such as mammary duct ecta¬ 
sia and intraductal papilloma. Most of these benign lesions 
carry no increased risk of breast cancer. One pathologic lesion, 
atypical hyperplasia, does increase risk by 3 to 5 times. 4 ' 6 Each 
of these benign processes may cause symptoms or signs that 
mimic malignancy. 

Breast cancer is an unrestrained proliferation of cells aris¬ 
ing in tissue of the ducts or lobules. Cancer arising from 
either type of tissue may be contained without spreading into 
surrounding stroma (ductal carcinoma in situ, and lobular 
carcinoma in situ) or may spread to contiguous tissues, 
through lymph channels, or hematogenously. Although duc¬ 
tal carcinoma in situ is a precursor lesion to invasive cancer, 
controversy surrounds its prognostic significance. 7 ' 8 Lobular 
carcinoma in situ is less common and is understood to be a 
marker for increased risk of development of invasive cancer, 
rather than a precursor lesion. 9 Invasive breast cancer carries 
a 15.3% 5-year mortality rate 10 ; advances in screening and 
treatment have contributed to a decrease in the mortality rate 
since 1989. 11,12 

Risk Factors for Breast Cancer 

Breast cancer is expected to occur in approximately 12% of 
American women during their lifetime. 13 Breast cancer risk 
in the general population is most affected by age and family 
history. The annual incidence at age 70 years (1 in 200) is 20 
times higher than that at age 30 years (1 in 4000) (Table 8-1). 14 
A woman with 2 first-degree relatives diagnosed as having 
breast cancer at an early age has a relative risk more than 4 
times that of a woman without such a family history. 15 
Other risk factors are related to estrogen exposure (age of 
menarche, first pregnancy and menopause, parity, and 
estrogen replacement therapy 15 ). Gail et al 16 have developed 
a model to estimate the breast cancer risk of individual 
women according to known risk factors. Among a few 
women, genetic mutations in the BRCA1 gene and, less 
commonly, BRCA2 gene confer a high risk of breast cancer 
(50%-80% during a lifetime) 17 ' 19 ; women with these muta¬ 
tions account for only 3% of all breast cancer cases. 20 

Clinically, strong risk factors affect the likelihood that 
any abnormality on CBE is cancer. For example, an abnor- 


Table 8-1 
Given Age 3 

Age, y 

Incidence of Breast Cancer Within 1 Year for Women at a 

Breast Cancer Incidence 

30 

1 in 4000 

40 

1 in 800 

50 

1 in 400 

60 

1 in 300 

70 

1 in 200 

80 

1 in 200 


“Data are from the United States and include all ethnicities from 1973-1995. 14 


mal finding is more likely to be malignant in an older 
woman than in a younger woman. The Canadian National 
Breast Screening Study (NBSS) 21 reported the positive pre¬ 
dictive value for CBE to be twice as high in women from 
50 through 59 years than in women from 40 through 49 
years. In the Breast Cancer Detection Demonstration 
Project (BCDDP), 22 the ratio of benign to malignant 
biopsy results decreased from 16.4 among women from 35 
through 39 years to 3.2 for women from 60 through 69 
years. 

METHODS 

We sought articles on effectiveness and test characteristics of 
the CBE. We identified potential English-language sources 
from the MEDLINE database for 1966 through 1997, using 
the search terms “physical examination,” “palpation,” “breast,” 
“breast diseases,” “diagnosis,” “diagnostic tests,” and “sensi¬ 
tivity and specificity.” We reviewed all potentially relevant 
articles and the reference lists of these articles. In addition, 
other articles known to us and their references were reviewed. 
We contacted investigators of several studies for further clari¬ 
fication and, in some cases, for unpublished data. All authors 
reviewed and agreed on the studies selected for inclusion in 
the pooled analysis. 

For information on the effectiveness of the CBE, we 
included all controlled trials and case-control studies in 
which CBE was at least a part of the screening modality. 

Data on CBE techniques included information from both 
clinical studies and studies using silicone models of the 
breast. The data synthesis on test characteristics of screening 
CBE in human populations used the following criteria: (1) CBE 
performed on asymptomatic population, (2) all screening 
outcomes reported (ie, total numbers of screens and positive 
screens), (3) breast cancer outcome determined for all 
screens, within a defined follow-up period, and (4) all breast 
cancers histologically confirmed. 

Summary measures for the sensitivity and specificity of the 
CBE and for likelihood ratios (LRs) of a positive or negative 
examination used published raw data from the reported tri¬ 
als that met our criteria. A random-effects model was used to 
generate conservative summary measures and confidence 
intervals (CIs). 23,24 

EFFECTIVENESS OF CBE 

Determining the effectiveness of screening CBE is difficult 
because no clinical trial has compared CBE alone with no 
screening. One randomized trial and one case-control study 
compared the combination of screening CBE and mammog¬ 
raphy with no screening and demonstrated statistically sig¬ 
nificant decreased breast cancer mortality rates of 20% and 
71%, respectively, in women between the ages of 40 and 64 
years 25,26 (Table 8-2). These results, along with the evidence 
from randomized trials 34,35 and case-control studies 36,37 that 
screening mammography alone decreases breast cancer mor¬ 
tality rates, make designing a clinical trial in which the con- 
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Table 8-2 Studies of Breast Cancer Screening That Included Clinical Breast Examination 








Age of 
Women at 
Entry, y 

No. of Women 

Screening Modality 

No. of 
Rounds 

Years 

Followed 

Up 

Mortality 
Reduction, 
RR (95% Cl) 

Study 

Years 

Examiners 

Intervention 

Comparison 

Intervention 

Comparison 

Trials Comparing Screening Group With an Unscreened Group 

Randomized Controlled Trials 

HIP of New York 25 

1963- 

1966 

Surgeons 

40-64 

30131 

30565 

CBE yearly; 

M yearly 

None 

4 

18 

0.77 

(0.62-0.97) 

Edinburgh random¬ 
ized trial of breast 
screening 27 

1979- 

1988 

Physi¬ 

cians, 

nurses 

45-64 

22944 

21 344 

CBE yearly; 

M alternate 
years 

None 

7 

10 

0.82 

(0.61-1.1) 

Nonrandomized Controlled Trial 

UK Trial 28,29,3 

1979- 

1988 

Physi¬ 

cians, 

nurses 

45-64 

45956 

127109 

CBE yearly; 

M alternate 
years 

None 

7 

10 

0.86 

(0.73-1.0) 

Case-Control Study 

The DOM Project 30,31 

1974- 

1981 

Medical 

assistants 

50-64 

14796 Invited: 

54 cases 

162 controls 

b 

CBE yearly; 

M yearly 

None 

4 

8 

0.29 

(0.14-0.62) 

Trials Comparing 2 Screening Strategies 

Canadian NBSS I 32 

1980- 

1988 

Nurses 

40-49 

25214 

25216 

CBE yearly; 

M yearly 

CBE 1 time 
only 

5 

7 

1.4 

(0.84-2.2) 

Canadian NBSS 2 33 

1980- 

1988 

Nurses 

50-59 

19711 

19694 

CBE yearly; 

M yearly 

CBE yearly 

5 

7 

0.97 

(0.62-1.5) 


Abbreviations: CBE, clinical breast examination; Cl, confidence interval; HIP, Health Insurance Plan; M, mammography; NBSS, National Breast Screening Study; RR, relative risk; 
UK, United Kingdom. 

a UK Trial includes data from the Edinburgh randomized trial. 

“Ellipses indicate not applicable. 


trol group members receive no screening unethical. It is 
unlikely that CBE alone will ever be compared with no 
screening in a randomized trial; therefore, we must use less 
direct evidence. 

Meta-analyses of trials 25 ' 27,34 ' 38 demonstrated that CBE or 
screening mammography decreases breast cancer mortality 
rates by about one-fourth in women from 50 through 69 
years 39 and by 18% in women in their 40s. 40 In several of these 
studies, breast cancer was detected using a combination of 
CBE and mammography 25 ' 28 (Table 8-2). These studies that 
compared a combination screening strategy with no screen¬ 
ing are the strongest scientific evidence for an effect of 
screening CBE. 

Other evidence comes from the randomized Canadian 
NBSS 2, 33 in which women from 50 through 59 years were 
offered either a standardized CBE alone or a CBE and mam¬ 
mography annually for 5 years. The 7-year breast cancer- 
specific mortality rate for women in these 2 groups was simi¬ 
lar, 33 suggesting that mammography may not offer mortality 
rate advantages over a careful screening CBE, at least for 
women in their 50s. 41 

Additional evidence comes from the Health Insurance 
Plan (HIP) study, 42 conducted during mammography’s 
infancy, in which most cancers were found by CBE. Mortal¬ 
ity reduction after 10 years in the HIP trial of 29% was sim¬ 
ilar to a 30% reduction in the Swedish Two-County Trial, 43,44 


which used mammography alone. The similarity in the per¬ 
centage of reduced mortality rates found in these 2 approaches, 
along with the NBSS described above, argues for the effec¬ 
tiveness of carefully conducted CBE. 

Finally, we compared the sensitivity of CBE and mam¬ 
mography in the trials that used both methods. In most 
cases, mammography outperformed CBE (Table 8-3). 
However, the sensitivity of the combined method was 
greater than that of mammography alone because CBE 
detected cancers that had been missed by mammography. 
The proportion of cancers detected by CBE alone ranged 
from 3.4% in the Edinburgh trial 45 to 45% in the HIP 
study. 25 Proportions of breast cancers found by CBE but 
missed by mammography in other studies 47 ' 58 range from 
5.2% 58 to 29%. 51 In one series, among women younger 
than 35 years, 23% of cancers were reported to be silent on 
mammography. 56 

The value of detecting breast cancers by CBE that are not 
detected by mammography is not known. The results of 
randomized trials using both modalities did not demon¬ 
strate improved results over those using only mammogra¬ 
phy; however, the many other differences in the trials make 
comparisons difficult. The mortality rate in women in 
whom breast cancer is missed by mammography and 
detected by CBE was higher than that in women whose can¬ 
cers were detected by mammography. 25,32,33,59 However, these 
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Table 8-3 Proportion of Cancers Detected by CBE and 
Mammography Screening 






Method of Detection, % 

Study 

Years 

No. of 
Cancers 

Mammography 

CBE 

Only 

Both 

Randomized Controlled Trials 

HIP of New York 25 

1963- 

1966 

132 

33 

45 

22 

Edinburgh random¬ 
ized trial of breast 
screening 45 

1978- 

198T 

88 

26 

3 

71 

Canadian NBSS I 32 

1980- 

1988 

255 

40 

24 

36 

Canadian NBSS 2 33 

1980- 

1988 

325 

53 

12 

35 

Demonstration Projects 

BCDDP 22 

1973- 

1981 

2045 

40 

9 

50 

West London 46 

1973- 

1977 

29 

34 

31 

34 


Abbreviations: BCDDP, Breast Cancer Detection Demonstration Project; CBE, clinical 
breast examination; HIP, Health Insurance Plan; NBSS, National Breast Screening Study. 
“Data are from prevalence screen only. 


women still may have benefited compared with women not 
screened by CBE. 

Bottom Line for Effectiveness 

The strongest evidence for breast cancer mortality rate 
reduction after screening CBE comes from studies in which 
both CBE and mammography were part of breast cancer 
screening. The individual contribution of CBE cannot be 
established. In every study, CBE contributed to cancer detec¬ 
tion independently of mammography. In one randomized 
trial, the 7-year breast cancer mortality rate was similar 
among women receiving a standardized CBE and women 
receiving both CBE and mammography. 

Test Characteristics 

Summarizing the precision and accuracy of CBE is difficult for 
several reasons. First, the examination is not well described in 
the majority of studies, and it is known that conduct of CBE 
varies widely. 60 Second, available studies included women dif¬ 
fering in age, history of symptoms (symptomatic and asymp¬ 
tomatic), and practice settings (primary care or surgical). 
Third, the reported test characteristics of CBE were deter¬ 
mined sometimes with and sometimes without accompanying 
mammography screening. The best standardized data come 
from studies of CBE on silicone models, but the applicability 
of these studies to women being screened is unknown. 

Precision of Examination 

Clinical breast examination, even when performed in large- 
scale studies, has generally not been standardized; only 1 trial 


(NBSS) reported any description of the examination tech¬ 
nique. 61 The lack of attention to a standardized CBE tech¬ 
nique may partly account for the interobserver variation 
found in studies among clinicians performing CBE. 

Thomas et al 62 compared findings in 103 women screened 
by 2 nurses and 2 surgeons independently. Agreement 
between the 2 nurses for any breast abnormality had a K of 
0.22, whereas the 2 surgeons’ K was 0.38. Chamberlain et al 63 
studied agreement between a nurse and a physician perform¬ 
ing independent screening CBE, with a K of 0.43. Boyd et al 64 
reported that 4 surgeons found 37 to 74 of 100 women 
screened to have abnormal findings; in only 25 women did 
all 4 agree on the findings. The K value for agreement 
between any 2 of the 4 surgeons was between 0.34 and 0.59. 
None of these studies described the CBE technique used by 
examiners. 

Precision varies by the particular physical finding. Ten sur¬ 
geons examining 242 women had varying indices of agree¬ 
ment (which reflects the chance of agreement using the 
method of Kendall and Stuart 65 ) for specific findings: the index 
of agreement for nipple discharge was 14%; skin findings such 
as dilated veins, 22%; peau d’orange, 24%; ulceration, 62%; 
and visibility of lesion, 68%. 66 For a lump (“saturated nodule”) 
the index of agreement was 59%. 

Bottom Line for Precision 

Clinicians using unstandardized CBE methods have demon¬ 
strated moderate degrees of agreement beyond that expected 
by chance. A standardized examination would likely improve 
precision. 

ACCURACY 

To determine its accuracy as a screening test, CBE must be 
compared with a criterion standard. Mammography cannot be 
that standard because cancers that are missed by mammogra¬ 
phy can be found on CBE. Elistology alone also cannot be the 
standard because tissue will never be obtained from all women 
whose abnormalities are detected by CBE. Even less likely is 
the histologic examination of breasts that are normal on exam¬ 
ination to determine specificity. A compromise criterion stan¬ 
dard is to follow up all screened women for a defined period; 
women diagnosed as having breast cancer must have histologic 
proof, and all cases of breast cancer among women screened 
during the follow-up period must be counted. This admittedly 
imperfect standard nevertheless is so stringent that few studies 
of breast cancer screening 22 ' 25 - 32 - 33,67 ' 68 meet it. 

We defined sensitivity as the number of women who had 
cancer found on CBE, divided by the sum of screen-detected 
cancers (found by CBE or mammography) and those interval 
cancers diagnosed in the year after screening. Specificity was 
defined as the number of women who had normal CBE 
results and did not develop breast cancer during follow-up, 
divided by all the women without cancer at the end of the 
follow-up period. 

The data show that sensitivity of CBE is far from perfect. 
Pooled data from human studies give an overall estimate for 
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the sensitivity of the CBE of 54% (95% Cl, 48%-60%) (Table 
8-4). Clinical breast examination sensitivity was higher than 
60% 32 ' 33,67 when screening rounds included only physical 
examination but was lower when both CBE and mammogra¬ 
phy were used in the screening. This difference may reflect 
the enhanced case-finding capacity of mammography. How¬ 
ever, 2 of the 3 studies with higher sensitivity also were the 
only ones using a well-described and standardized method of 
CBE. 32 - 33 It is possible that CBE sensitivity was higher because 
of superior CBE technique. 

The same trials provide data on the specificity of the CBE. 
Individual trial specificity ranged from 86% to 99%, with a 
pooled estimated specificity of 94% (95% Cl, 90%-97%). 

The combined data, pooled using a random-effects model 
to adjust for heterogeneity, indicate that the LR of a positive 
CBE result is 11 (95% Cl, 5.8-19), whereas the LR of a nega¬ 
tive test result is 0.47 (95% Cl, 0.40-0.56). The positive LR is 
more discriminating than the negative LR, which is to say, a 
positive finding on examination conveys more information 
about an increased chance of cancer than does the finding of 
a benign examination offer certainty about the absence of 
breast cancer. This would be expected, given what we know 
about the frequent discovery by mammography of impalpa¬ 
ble cancers. 

Clinical breast examination is associated with a relatively 
high false-positive rate and an even higher false-negative rate. 
There are no data on the effect of the false-positive outcomes 
in terms of subsequent health care use or on women’s psy¬ 
chological status, both of which have been issues for false¬ 
positive mammography results. 1,69,70 

Lumps embedded in silicone breast models provide their 
own standard. Clinical breast examination sensitivity as mea¬ 
sured in silicone models (40%-71%) was similar to that 
found in population studies. 60,71-75 On the other hand, speci¬ 


ficity measured in models was lower than in population stud¬ 
ies (41%-77%). 71-75 

Bottom Line for Accuracy 

The sensitivity of the CBE is approximately 54%. The speci¬ 
ficity of the examination is about 94%. 

Examiner Factors 

Studies in humans and silicone models demonstrate several 
factors, pertaining to both examiner and woman, that influ¬ 
ence the accuracy of the CBE. 

Duration of the Examination 

Clinical breast examination duration correlated significantly 
with lump detection accuracy in experiments involving sili¬ 
cone breast models. In 5 studies, mean examination duration 
was always longer for examiners with higher sensitivity 
(Table 8-5). The highest recorded sensitivity in human stud¬ 
ies (69%) was achieved in the NBSS, in which examiners 
took between 5 and 10 minutes to complete examination of 
both breasts. 21 

Technique 

The use of correct CBE technique (a systematic search pat¬ 
tern, thoroughness, varying palpation pressure, 3 fingers, fin¬ 
ger pads, and circular motion) also correlated with better 
examination sensitivity in silicone models (Table 8-5). The 
number of correct techniques was greater among examiners 
with higher CBE sensitivity. 

Examiner Experience 

Experience with abnormal breast lumps may be important. 
Even after controlling for technique differences, medical resi¬ 
dents found more lumps in silicone models than lay women 
did before special training. 74 Almost none of the women had 


Table 8-4 Sensitivity and Specificity of Clinical Breast Examination in Human Studies 2 




Study 

Years 

Age, y 

Screening 

Modality 

No. of 
Rounds 

CBE Sensitivity, % 

CBE Specificity, % 

LR+ (95% Cl)» 

LR- (95% Cl) b 

HIP of New York 25 

1963-1966 

40-64 

CBE and M 

4 

49 

99 

46 (39-54) 

0.51 (0.44-0.59) 

UK Trial 67,68 

1979-1988 

45-64 

CBE only 

3 

64 

95 

14(12-16) 

0.37 (0.29-0.48) 




CBE and M 

4 

51 

C 



Canadian NBSS I 32 

1980-1988 

40-49 

CBE only 

1 

69 

86 

4.8 (4.2-5.5) 

0.36 (0.27-0.49) 




CBE and M 

5 

48 

92 

6.1 (5.4-6.8) 

0.57 (0.50-0.63) 

NBSS 2 33 

1980-1988 

50-59 

CBE only 

5 

63 

94 

11 (9.6-12) 

0.39 (0.33-0.46) 




CBE and M 

5 

40 

94 

7.2 (6.3-8.2) 

0.63 (0.58-0.69) 

BCDDP 59 

1973-1981 

35-74 

CBE and M 

5 

52 




West London 45,0 

1973-1977 

>40 

CBE and M 

4 

56 

89 



Pooled result (95% Cl) 





54 (48-60) 

94 (90-97) 

11 (5.8-19) 

0.47 (0.40-0.56) 


Abbreviations: BCDDP, Breast Cancer Detection Demonstration Project; CBE, clinical breast examination; Cl, confidence interval; HIP, Health Insurance Plan; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio; M, mammography; NBSS, National Breast Screening Study. 

“Case definition includes all cancers found at screening (by either method) and interval cancers found within 12 months of screening, except where noted otherwise. 
b An LR is the probability that persons with a disease have a particular test result divided by the probability that persons without the disease have that result. The LR+ is determined by divid¬ 
ing the sensitivity by the probability of an abnormal CBE result among women without breast cancer (1 - specificity). The LR- is calculated as (1 - sensitivity)/specificity. 

“Ellipses indicate not applicable. 

“Specificity data based on first round only, with 6 months’ follow-up. 
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Table 8-5 The Relationship Between Clinical Breast Examination Sensitivity and Duration or Techniques Used on Silicone Models 3 





Mean Duration 

, min 

Mean No. of Correct Techniques Used * * 6 

Study Participants 

No. of 
Participants 

Median 
Sensitivity, % 

Sensitivity < 

Group Median 

Sensitivity > 

Group Median 

Sensitivity < 

Group Median 

Sensitivity > 
Group Median 

Women patients 71 

260 

44 

1.5 

1.9 

2.9 

3.7 

Medical students 76 

151 

100 

2.3 

2.8 

2.7 

3.7 

Medical residents 72 

60 

61 

1.7 

2.5 

2.9 

3.4 

Practicing physicians' 

60 

55 

1.9 

2.4 

2.3 

2.7 

Total 8 

531 


1.8 

2.3 

2.8 

3.6 


“In each study, examiners were divided into 2 groups: those with examination sensitivity at or above the group median and those with sensitivity below the group median. Mean 

values tor duration and numbers of correct techniques used are presented tor these 2 groups. 

6 Of a total of 6 correct techniques: systematic search pattern, thorough examination, varying palpation pressure, 3 fingers, pads of fingers, and small circular motion. 

'Russell Harris, MD, MPH, University of North Carolina at Chapel Hill, written communication, February 1999. 
d P< .001 for pooled differences in both duration and number of techniques. 


ever felt either a real or simulated breast lump before the 
testing session, whereas 77% of the physicians had. Among 
the residents, previous experience also predicted higher 
sensitivity. After practice with silicone models containing 
embedded lumps, the women’s abilities approached that of 
physicians. 71 However, 2 other studies found no differences 
in sensitivity across categories thought to correlate with 
experience. 60,77 

Bottom Line for Examiner Influence on Accuracy 

Spending adequate time on the CBE and using the proper 
techniques improve breast lump detection. 

Patient Factors 

Age 

On average, younger women have denser breasts that make 
lump detection more difficult, whereas in older women, the 
breast becomes more fatty, making lump detection easier. 78 In 
one referral population, examiners’ sensitivity was 86% among 
women aged 20 through 49 years and 96% among women aged 
50 years and older. 59 Silicone models simulating postmeno¬ 
pausal breast tissue improved sensitivity over that in models 
simulating premenopausal breast tissue (64% vs 51%). 75 Two 
large trials came to a different conclusion, albeit among women 
in narrowly defined age ranges. The BCDDP found CBE sensi¬ 
tivity of 53% among women between 40 and 49 years and 48% 
among women between 50 and 59 years. 22 The NBSS 79 reported 
higher CBE sensitivity in women aged 40 through 49 years 
(68%) compared with those aged 50 through 59 years (63%), 
among women receiving both mammography and CBE. Fur¬ 
ther study is needed on this issue. 

Breast Characteristics 

Clinical breast examination sensitivity is slightly lower in 
women with larger breasts. 80 Women’s breasts also vary in the 
amount of background glandular nodularity that is a normal 
characteristic of breast tissue. 81 Many women have ill-defined 
fibrocystic changes that make their breasts feel particularly 
lumpy; anecdotally, clinicians (and women) find it more diffi¬ 
cult to detect breast cancer in lumpy breasts. 


Cancer Characteristics 

Breast cancers vary in size, hardness, mobility, and location 
in the breast. Clinical breast examination sensitivity probably 
varies according to these characteristics of cancers. Prognosis 
generally follows cancer size at diagnosis, so it is important to 
determine the accuracy of CBE for small cancers, that is, 2 
cm or less. In the BCDDP, sensitivity for noninfiltrating can¬ 
cers was 35%; for infiltrating cancers smaller than 1 cm, 36%; 
and for infiltrating cancers at least 1 cm, 52%. 22 

To date, most information about CBE accuracy by lump 
characteristic comes from experiments carried out on silicone 
breast models with embedded lumps varying in size, hardness, 
and placement. These experiments found sensitivity increased 
with lump size (from 14% for 3-mm lumps to 79% for 1-cm 
lumps) and hardness (from 42% for 20-durometer lumps to 
72% for 60-durometer lumps). Durometers are a measure of 
hardness; 20 durometers corresponds to a soft to medium¬ 
hardness grape, whereas a 60-durometer mass is almost as 
hard as calcified bone. Medium or deep placement of the lump 
in a model did not alter sensitivity. 59,72,74 

Bottom Line for Patient Effects on Accuracy 

A woman’s age and the size and lumpiness of her breasts may 
affect the ability of examiners to detect cancer. Size and hard¬ 
ness of breast cancers also affect CBE sensitivity. 

Suggested Approach 

Many physical diagnosis textbooks give directions for carrying 
out a breast examination. 82 ' 85 They all involve palpation and 
inspection, but research has stressed palpation. The approach 
outiined below is derived from a review of the research literature 
and owes much to the work of Baines, 3,86 Baines et al, 21 Baines and 
Miller 79 and others 87 ' 91 because of their work in standardizing the 
examination. Our recommendation incorporates practices from 
the Mammacare Method because its components have been vali¬ 
dated in independent investigations of CBE technique. 71,72,92 

Palpation 

Variables important in palpating the breast correctly are patient 
position; breast boundaries; examination pattern; finger posi¬ 
tion, movement, and pressure; and duration of the examination. 
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Patient Position 

Clinical breast examination requires flattening breast tissue 
against the patient’s chest; she should be supine during the 
examination. The importance of maneuvers to flatten the breast 
depends on breast size; they are particularly useful in women 
with large breasts. To flatten the lateral part of the breast, have 
the patient roll onto her contralateral hip, rotate her shoulders 
back into a supine position, and place her ipsilateral hand on her 
forehead (Figure 8-1). To flatten the medial part of the breast, 
the woman should he flat on her back and move her elbow up 
until it is level with her shoulder (Figure 8-1). 

Breast Boundaries 

Breast tissue extends laterally toward the axilla and superi¬ 
orly toward the clavicle. To be sure that all breast tissue is 
examined, it is best to cover a rectangular area bordered by 
the clavicle superiorly, the midsternum medially, the midax- 
illary line laterally, and the bra line inferiorly. 

Examiner Pattern 

Palpation begins in the axilla and extends in a straight line 
down the midaxillary line to the bra line (Figure 8-1). The 
fingers then move medially, and palpation continues up the 
chest in a straight line to the clavicle. The entire breast is cov¬ 
ered in this manner, going up and down between the clavicle 
and the bra line. To examine all breast tissue, rows should be 
overlapping. This vertical strip pattern (or lawnmower tech¬ 
nique) was found to be more thorough than concentric cir¬ 
cles or a radial spoke pattern. 92 In one study, two-fifths of 
physicians used no discernible pattern at all. 60 


[a] Patient position for medial examination of breast 
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Figure 8-1 Position of Patient and Direction of Palpation for the 
Clinical Breast Examination 

The figure shows the positioning of the patient for examining the (A) medial 
and (B) lateral portions of the breast. See “Suggested Approach” section for 
complete description. 


Fingers 

Most texts scarcely describe what the fingers should do 
during palpation, an ironic situation because the fingers 
must detect and differentiate abnormal lumps in breast 
tissue. Behavioral psychologists have shown that the finger 
can detect a soft (20-durometer) 2-mm lump in simulated 
breast tissue when specific techniques are used. 88,90,93 These 
researchers developed a breast palpation technique (the 
Mammacare Method) combining the vertical strip pattern 
and specific finger techniques, taught using discrimina¬ 
tion skill practice (with the use of silicone breast models) 
to enhance lump detection. Their method is described 
below. 

The 3 middle fingers are held together, with the metacar- 
pal-phalangeal joint slightly flexed. The pads (not tips) of the 
fingers (Figure 8-2) are the examining surface. (Confusion 



Figure 8-2 Palpation Technique 

Pads of the index, third, and fourth fingers make small circular motions (A), 
as if tracing the outer edge of a dime. A vertical strip pattern (B) ensures an 
examination of the entire breast. 
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Figure 8-3 Levels of Pressure for Palpation of Breast Tissue Shown 
in a Sagittal View of the Right Breast 

The examiner should make 3 circles with the finger pads, increasing the level 
of pressure (light, medium, and deep) with each circle. 


regarding the definition of the finger pad exists even among 
experienced examiners. 86 ) Each area is palpated by making 
small circles as if following the edge of a dime (Figure 8-2). 
At each spot, 3 circles with different pressures—light, 
medium, and deep—are made to ensure palpation of all 
levels of tissue (Figure 8-3). 

Duration 

A careful examination of an average-sized breast (brassiere 
size B) takes at least 3 minutes (6 minutes for both breasts). 
This is much longer than the average 1.8 minutes physicians 
spent in one study examining both breasts and giving 
instructions for breast self-examination. 94 If it seems awk¬ 
ward to spend this amount of time, clinicians should discuss 
with patients the time needed to do a complete examination 
and discuss the procedure during the examination. 

Other Issues 

Palpation of the supraclavicular and axillary regions to detect 
adenopathy is a standard part of the CBE, though untested. 
Breast cancer was found in a minority of women with iso¬ 
lated axillary lymphadenopathy and normal CBE results in 2 
series (12% and 29%, respectively). 95,96 

Palpation of the nipple area is performed in the same man¬ 
ner as the rest of the breast. Although some texts call for 
squeezing the nipple to express discharge, 44,82,83,97 among 448 
women complaining of nipple discharge, expression of fluid 


was not a useful prognostic sign for cancer. Of the women 
with otherwise normal CBE findings, 3 (2%) of the 151 
women with spontaneous discharges were diagnosed as hav¬ 
ing cancer, whereas none (0%) of the 178 women with dis¬ 
charges only apparent by expression were diagnosed as 
having cancer. 98 

Inspection 

The importance of inspection is unproved. Most commonly, 
directions for inspection suggest that the woman face the 
examiner with her arms at her side. The breasts are then 
inspected for nipple abnormalities, dimpling, and retraction 
or tethering of the skin. No adequate data support recom¬ 
mendations of some authorities 61,99,100 to examine women in a 
variety of other positions, such as raising her hands over her 
head, putting her hands on her hips and bearing down (to 
contract the pectoral muscles), or leaning forward to allow 
the breasts to hang out from the chest. 

In a series of 296 breast cancers found on breast examina¬ 
tion, 101 96% were discovered on palpation, only 1% by retrac¬ 
tion alone, and another 3% by visible nipple abnormalities. 
The women’s position when these visual cues were elicited 
was not reported. Inspection and positioning the patient for 
inspection take time. Given these facts and given the press of 
time, we suggest that in asymptomatic women clinicians 
should concentrate on careful breast palpation, all the while, 
of course, using their eyes. If the patient is symptomatic, or if 
an abnormality is discovered during palpation of an asymp¬ 
tomatic patient, careful inspection should be added. 

Bottom Line of the Suggested Approach 

Use a vertical strip pattern to cover all the breast tissue. Make 
circular motions with the pads of the middle 3 fingers and 
examine each breast area with 3 different pressures. Spend at 
least 3 minutes on each breast. 

Teaching the Technique 

What is the evidence that using the Mammacare Method 
improves lump detection abilities and that the technique can 
be taught? 

In one study, 20 lay women taught according to the Mam¬ 
macare Method doubled their detection of known breast 
lumps in other volunteer women, although they also increased 
the number of false-positive detections after training. 89 Three 
randomized trials using silicone breast models evaluated train¬ 
ing of internal medicine residents, graduate nurses, medical 
students, and female patients. 71 ' 73 All showed that training 
improved CBE sensitivity when measured on silicone models. 
Pooling the results, the training improved sensitivity by 13 
percentage points (95% Cl, 10%-16%) from 46% to 59%, 
whereas the specificity declined nonsignihcantly by a mean of 
4 points (95% Cl, -8.9 to 0.7) from 61% to 57%. 

Does the effect of teaching persist? In one study, 91 
patients were taught the Mammacare Method and, 1 year 
later, were able to find more lumps in silicone breast models 
than women either taught the traditional (circular) CBE pat¬ 
tern or not taught at all. 71 Similar results occurred in ran¬ 
domized studies using silicone models with medical students 
and nurses, 72,76 with the effect persisting from 4 to 6 months. 
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In most cases, sensitivity improved without adverse effects 
on specificity. However, among medical residents, higher 
sensitivity was at the expense of specificity in silicone model 
testing. A 6-month medical record review of patients cared 
for by these physicians did not demonstrate any deteriora¬ 
tion in CBE specificity in patients. 72 

Are Lumps Ever Normal? 

Normal breasts are often lumpy; the clinician’s job is to dis¬ 
tinguish normal from abnormal (cancerous) lumps. Can¬ 
cers classically are characterized as hard, fixed, and irregular, 
whereas benign breast lumps are the opposite: soft or cystic, 
movable, and regular. However, many cancers do not con¬ 
form to the classic picture, and benign masses can mimic 
cancers. LRs for the presence of these signs (calculated from 
HIP data, 102 after Mushlin 103 ) are unimpressive, except for 
fixed lesions (LR, 2.4) and lumps greater than 2 cm (LR, 
1.9); none of the LRs fall in the range considered discrimi¬ 
nating (Table 8-6). Table 8-6 also shows the resulting suc¬ 
cession of probabilities if a 64-year-old woman had a mass 
on CBE and if the mass had the listed positive findings. (It 
is assumed that the findings are independent, although 
there is not information about the independence of the 
findings.) In 2400 women undergoing 10905 screening 
CBEs in a community setting during a 10-year period, an 
abnormal CBE result was associated with an LR of 2.1 
(Joann G. Elmore, MD, MPH, Harborview Medical Center, 
Seattle, Washington, written communication, June 1998). A 
positive screening CBE result in an average-risk woman 
conveys less risk of cancer than does a woman presenting 
with a breast lump (LR, 55) 104 or an abnormal screening 
mammogram result (LR, 26). 105 

Because the characteristics of cancerous lumps overlap 
with those of noncancerous lumps, clinicians rarely diagnose 
breast cancer with CBE. Careful CBE can locate abnormali¬ 
ties. Further evaluation with other tests is then required. 106108 

THE BOTTOM LINE 

Screening CBEs should be conducted for women who are at 
risk for breast cancer and for whom breast cancer screening 
has been shown effective. Presently, this includes women 
older than 40 years. A well-conducted CBE can detect at least 
50% of asymptomatic cancers and may contribute to mortal¬ 
ity rate reduction in women screened. 

Resolution of Scenarios 

The discovery of a breast mass in a 64-year-old patient con¬ 
veys an increased risk of cancer. Her pretest probability of 
invasive cancer in the coming year is 0.35% (347 cases per 
100000 women 14 ). Your finding on CBE gives a posttest prob¬ 
ability of 0.73% (Table 8-6). If the mass is greater than 2 cm 
and has all the other malignant characteristics, the probabil¬ 
ity of cancer increases to 8.8% (Table 8-6). 

The 42-year-old woman with no breast symptoms has a 
pretest probability of breast cancer of 0.12%, or 119 per 
100 000. 14 A normal CBE result would decrease her risk of 


Table 8-6 Breast Cancer Probabilities in a 64-Year-Old Woman 
Assessed After Each of a Succession of Positive Findings 3 

Prior 

Probability 
of Breast 
Cancer, % 

Prior Odds 

Finding 

LR+ b 

Successive 

Posterior 

Odds' 

Successive 
Posterior 
Probability, % 

0.35 

0.0035 

Mass 

2.1 

0.007 

0.73 



Fixed 

2.4 

0.018 

1.7 



Hard 

1.6 

0.028 

2.8 



Irregular 

1.8 

0.051 

4.9 



>2-cm 

Lump 

1.9 

0.097 

8.8 


Abbreviation: LR+, positive likelihood ratio. 

The effect of a particular finding is expressed in the following way: prior odds x 
likelihood ratio (LR) = posterior odds. Probabilities and odds are interconverted 
according to these formulae: prior odds = prior probability/(1 - prior probability); 
posterior probability = posterior odds/(1 + posterior odds). 

"LRs are calculated from data on cases diagnosed through June 1970 in the Health 

Insurance Plan Breast Cancer Screening Study, 102 after Mushlin. 103 

'The LR for each positive finding is applied to the posterior odds from the line above, 

using an assumption that the findings contribute independently to the odds of breast 

cancer. 


breast cancer to 0.11%, but with such a low baseline risk, the 
difference is hard to appreciate. An explanation of her low 
pretest probability may suffice; however, the psychological 
reassurance she may gain from a CBE could increase the 
value of this maneuver. 

Priorities for Research 

Standardization of CBE is sorely needed. Numerous studies 
suggest that the Mammacare Method improves the perfor¬ 
mance characteristics of CBE on silicone models; further 
work should be done to determine whether the Mammacare 
technique (or other standardized methods) can improve CBE 
sensitivity and specificity in patient populations. The contri¬ 
bution of visual inspection has been found to be associated 
with better outcomes in women who use it as part of breast 
self-examination. 109 This should be investigated as to its con¬ 
tribution to the CBE. 

Screening CBE may be particularly useful in women older 
than 70 years because fatty changes in the breast make lump 
detection easier, and older women do not accept mammog¬ 
raphy as readily as younger women. 110 Comparison of test 
characteristics of standardized CBE with mammography in 
older women is needed. At the other end of the age spectrum, 
because mammography misses substantial numbers of breast 
cancers in women younger than 50 years, studies are needed 
to determine whether standardized CBE can contribute to 
decreasing breast cancer mortality rates in this age group. 

The cost-effectiveness of CBE screening deserves study if it 
is to be compared with other maneuvers available for breast 
cancer screening and compared with other primary care 
maneuvers that it may displace in a 15-minute visit. Simi¬ 
larly, cost-effectiveness of programs to teach providers how 
to perform the examination should be evaluated. 
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Although some argue that the CBE adds nothing to reg¬ 
ular mammography screening, an overall view of the evi¬ 
dence suggests that a carefully performed CBE detects 
cancers that are potentially curable. If research confirms 
that CBE is as effective as mammography in reducing 
breast cancer mortality rates for older women, then physi¬ 
cians will want to perform CBE regularly and perform it 
well. 
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UPDATE: Breast Cancer 



Prepared by Mary B. Barton, MD 
Reviewed by Kathryn A. Myers, MD, EdM 


CLINICAL SCENARIO 


A 55-year-old woman without a family history of breast or 
ovarian cancer and without a personal history of mantle 
radiation, suggesting average risk of breast cancer, comes 
to your office requesting a magnetic resonance imaging 
(MRI) for screening for breast cancer. Will the findings on 
a clinical breast examination (CBE) affect the likelihood of 
breast carcinoma? 

UPDATED SUMMARY ON BREAST CANCER 

Original Review 

Barton MB, Harris R, Fletcher SW. Does this patient have 
breast cancer? the screening clinical breast examination: 
should it be done? how? JAMA. 1999;282(13):1270-1280. 

UPDATED LITERATURE SEARCH 

We searched the PubMed database for the period October 
1998 to September 2004, using the terms “breast” and 
“palpation,” in combination with the original search strat¬ 
egy, including the terms “physical exam,” “professional 
competence,” “medical history taking,” “sensitivity,” “spec¬ 
ificity,” “observer variation,” “reproducibility of results,” 
“diagnostic tests,” and “Bayes theorem.” The search was 
limited to articles in English and indexed as human stud¬ 
ies. Seventy-five articles were identified, and their abstracts 
were reviewed. Seventeen potentially eligible articles were 
retrieved according to their abstracts, and the articles and 
their reference lists were reviewed. In addition, the titles of 
43 articles that had referenced the original review were 
reviewed and, of these, the abstracts of an additional 9 
were considered. A total of 23 articles were read for 
salience and quality. For the purpose of updating the 
information synthesis on the characteristics of CBE in 
human screening populations, only 1 article contained 
data for both the sensitivity and specificity of the CBE, 1 
and an additional article provided data on sensitivity 
only. 2 No studies have been published with relevant infor¬ 
mation on the effectiveness of CBE during this interval. 


Several articles with information relevant to the teaching 
of the CBE are included in this summary review. 

NEW FINDINGS 

• Although finding a breast lesion on clinical examination 
increases the likelihood of cancer (likelihood ratio [LR], 
approximately 9), in community-based settings, the posi¬ 
tive predictive value was low (2.9%-4.3%). 

• The maximum expected sensitivity in asymptomatic 
women in current general community practice is 36%. 

• About 5% of all breast cancer cases were detected by CBE 
alone. 

Details of the Update 

Although no major advances in knowledge about the patho¬ 
physiology of breast cancer have been made, the public level 
of concern and the controversy around breast cancer screen¬ 
ing continue to be high. In the last 5 years, there have been 
scientific debates on the utility of mammography 3 and news¬ 
paper exposes on the variability in quality of mammography 
reading in the United States. 4 The publication of negative 
data from 2 trials of breast self-examination 5 ’ 6 resulted in a 
repeated “insufficient evidence” recommendation from the 
US Preventive Services Task Force (USPSTF) 7 and led to 
downgrading the recommendation of teaching this practice 
to “not recommended” by the Canadian Task Force. 8 
National Health Interview Survey data indicate that use of 
the CBE has decreased during the last 10 years, whereas sub¬ 
stantially more women reported recent mammography in 
2000 than in 1990 ( ble 8-7). 9 


Table 8-7 Mammography Screening Is Increasing as Clinical Breast 
Examination Is Decreasing 

Age, y 

% Reported 
CBE in 1990 

% Reported 
CBE in 2000 

% Reported 
Mammography 
in 1990 

% Reported 
Mammography 
in 2000 

40-49 

83 

76 

55 

65 

50-64 

78 

79 

56 

79 

65+ 

71 

68 

43 

68 


Abbreviation: CBE, clinical breast examination. 
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At the same time, new studies of MRI for breast cancer 
screening in high-risk populations 1014 have generated public 
interest in this technology. MRI in an average-risk popula¬ 
tion has not been studied and would likely not be feasible 
because of the low positive predictive value that would 
accompany the use of such a highly sensitive test in a popula¬ 
tion with a low prevalence of breast cancer. 

A new large series of CBEs has been published by Bobo et 
al 1 (see ). This series of 752081 CBEs reported an 

overall sensitivity of 59%; 5.1% of all cancers diagnosed were 
found only by CBE (ie, were found in women with an abnor¬ 
mal CBE result and a normal mammogram result). The sen¬ 
sitivity must be viewed with 3 caveats: many women 
presenting for examination in the Bobo et al 1 study did so 
because of concern due to patient-observed palpable findings 
(discovered on self-examination or by accident) or skin or 
nipple changes; the sensitivity of the CBE in these women 
was 85% vs 36% for women without symptoms (eg, a true 
screening population). Second, although women with an 
abnormal CBE or mammogram result were followed care¬ 
fully to the resolution of the finding, there was no systematic 
follow-up of women with normal examination results. Only 
about 25% of these women returned the following year; for 
this reason, the sensitivity estimate of the screening CBE 
must be seen as an upper limit of the true sensitivity in this 
population. Third, the technique for CBE was not standard¬ 
ized across the many study sites, nor were any efforts at 
ensuring the quality of the examination described. Because 
of these limitations, we did not revise the LR estimates for 
the clinical examination in detecting breast cancer during 
screening evaluations. These caveats aside, the main finding 
of this study, that CBE in the community could contribute to 
breast cancer detection, is supported and is important from 
an effectiveness point of view. 

Oestreicher et al 2 reported on 468 women with breast can¬ 
cer who had taken part in a managed care organization’s 
breast cancer screening program. In that program, CBE 
detected 35% of tumors diagnosed within 1 year of screen¬ 
ing, and 5.8% of the cancers were diagnosed by CBE and not 
detected by mammography. Factors significantly associated 
in a multivariable model with lower sensitivity of the CBE 
were age younger than 50 years or older than 80 years and 
increased body weight (defined as > 135 lb [61.2 kg]). Better 
sensitivity was associated with Asian race (compared with 
white) and tumors greater than 1 cm. Although this sample is 


Table 8-8 Clinical Breast Examination Characteristics Change When 
Women Have Breast Symptoms 1 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Asymptomatic women 

9.5(8.9-10) 

0.66 (0.64-0.69) 

Symptomatic women 3 

2.5 (2.4-2.6) 

0.22 (0.20-0.25) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Symptoms in the breast that cause a woman to present to a physician include pain, 
finding of a lump, nipple discharge, or other change in the nipple or the skin, each of 
which is associated with an increased risk of breast cancer. 


small compared with that in the Bobo et al 1 study, there are 
striking similarities in both the sensitivity of the screening 
CBE and the proportion of cancers found only by CBE. 

Costanza et al 15 described the results of a trial using standard¬ 
ized patients to teach CBE skills to practicing clinicians; those 
completing a 5-hour training session had improved performance 
on each of 7 separate components of CBE technique. Vetto et al 16 
provided CBE training with silicone models to 205 practicing 
primary care physicians and found in a pretest-posttest design 
that lump detections increased significantly (proportion finding 
from 3 to 5 of 5 lumps went from 59% to 94%; P < .001) and 
false-positive detections decreased significantiy (27% with 2 or 
more before training, and 15% after training, P < .004). 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

None. 

CHANGES IN THE REFERENCE STANDARD 

None. 

RESULTS OF LITERATURE REVIEW 

See Table 8-8. 

EVIDENCE FROM GUIDELINES 

Recent guidelines regarding the CBE remain as they were in 
1999: an “I” recommendation (ie, the USPSTF concludes 
that the evidence is insufficient to recommend for or against 
routinely providing the service) from the USPSTF 17 and con¬ 
sensus-based recommendations for annual screening from 
the American Cancer Society (every 3 years for women aged 
20-39 years and annually thereafter) 18 and the American Col¬ 
lege of Obstetricians and Gynecologists (annually for all 
women). 19 Breast self-examination by patients is now “not 
recommended” by the USPSTF and Canadian Task Force. 


CLINICAL SCENARIO—RESOLUTION 


A 55-year-old woman at average risk of breast cancer 
should have a careful medical history taken to elicit symp¬ 
toms, be advised of the benefits and risks of screening 
mammography, 20 and be offered a CBE. She should not be 
offered MRI for screening according to the data available 
at this time. If a CBE is performed, the LRs would suggest 
the following according to the findings of the examina¬ 
tion: with a baseline risk of cancer of 1 in 350 (or 2.8 per 
1000) in the coming year, a normal examination result 
(LR, 0.47) suggests a decrease in her risk to 1 in 744. An 
abnormal examination result (LR+, 11) would suggest an 
increase in risk to 1 in 33, and she should be referred for 
further investigations and treatment. 
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BREAST CANCER— MAKE THE DIAGNOSIS 


, Table 8-10 Detecting the Likelihood of Breast Cancer 

The risk of breast cancer increases as a function of age. 

The lifetime risk for US women is 12%. The annual risk is Finding 3 LR+ (95% Cl) LR- (95% Cl) 

shown in Table 8-9. CBE 11 (5.8-19) 0.47 (0.40-0.56) 

Table 8-9 Breast Cancer Risk Increases With Age 21 Abbreviations: CBE, clinical breast examination; Cl, confidence interval; LR+, positive 

likelihood ratio; LR-, negative likelihood ratio. 

Age, y Incidence, % “Pooled results based on 7 studies. 2228 
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POPULATION FOR WHOM A SCREENING CBE 

SHOULD BE CONSIDERED 

• Women who would be considered for mammography 
screening (eg, women 40 years and older) should be 
offered a CBE (Table 8-10). 

• Women with a positive family history for breast cancer 
may benefit from breast cancer screening starting at a 
younger age. 
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EVIDENCE 


TO SUPPORT 


THE UPDATE: 


Breast Cancer 



TITLE Findings From 752081 Clinical Breast Examina¬ 
tions Reported to a National Screening Program From 
1995 Through 1998. 

AUTHORS Bobo JK, Lee NC, Thames SF. 

CITATION /Natl Cancer Inst. 2000;92(12):971-976. 

QUESTION What are the sensitivity, specificity, and 
positive predictive value of clinical breast examinations 
(CBE) performed in community settings? 

DESIGN A national program designed to provide cancer 
screening to low-income women paid for examinations 
performed in a variety of settings. Records provided by 
those providers included documentation of CBE findings, 
as well as results of diagnostic evaluations for women with 
abnormal CBE or mammogram findings. Complete fol¬ 
low-up and ascertainment of interval cancers were not 
available for all women. 

SETTING United States: facilities including university and 
community-based hospitals and clinics, health department 
clinics, mobile mammography units, and private-practice 
offices. 

PATIENTS A total of 564708 adult women who pre¬ 
sented for 752081 breast examinations. Of the examina¬ 
tions, 87815 were done on women who were known to 
have breast symptoms at the examination; 589048 were 
performed on asymptomatic women. 


DESCRIPTION OF ItSTS AND DIAGNOSTIC STANDARD 

CBE technique was not dictated or described. Concurrent mam¬ 
mography was provided in nearly all CBEs. Interval cancers 
could be determined only for patients with more than 1 screen¬ 
ing record in the study period (-25% of the population). 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and positive predictive value. 

MAIN RESULTS 

See Table 8-11. 

CONCLUSIONS 

LEVEL OF EVIDENCE Analysis of a large database. 

STRENGTHS This study contains valuable data on current 
practice outside of research settings. This national undertak¬ 
ing to provide screening services to low-income women had 
the forethought to require documentation of findings in a 
consistent manner. It is the largest such report of a multi¬ 
center database of nonresearch clinical examinations. 

LIMITATIONS The data include more than 750000 CBEs, 
but the number done in asymptomatic women is lower. The 
sensitivity and the likelihood ratios associated with the exam¬ 
ination differ, depending on whether a woman has symptoms 
or not. Symptoms in the breast that bring a woman to 


Table 8-11 Clinical Breast Examination Characteristics Change When Women Have Breast Symptoms 


Clinical Breast Examination 

Sensitivity, % 

Specificity, % 

Positive Predictive Value, % (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 

Asymptomatic patients 

36 

96 

2.9 (2.6-3.1) 3 

9.5(8.9-10) 

0.66 (0.64-0.69) 

Symptomatic patients 

85 

73 

5.6 (5.3-5.9) a 

2.5 (2.4-2.6) 

0.22 (0.20-0.25) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Calculated from data provided in the report. 
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present to a physician include pain, finding of a lump, nipple 
discharge, or other change in the nipple or the skin, each of 
which is associated with an increased risk of breast cancer. 1 
Although 87% of the examinations in the series were done 
on asymptomatic women, 47% of the cancers detected were 
found in women who came to the program with symptoms. 

With regard to screening CBE in the asymptomatic popu¬ 
lation, several points are worthy of note. First, the measured 
sensitivity of the CBE must be seen as an upper limit to the 
true sensitivity because there was no systematic follow-up of 
women who had normal examination results, and one must 
allow that interval cancers occurred in the group of women 
lost to follow-up, which are not recorded. Second, the lack of 
standardized procedures used in the performance of the CBE 
causes one to wonder for this study, as for most of the screen¬ 
ing studies reviewed in the original article, whether the per¬ 
formance characteristics of the CBE would improve with 
trained examiners following a standard protocol. 

REFERENCE FOR THE EVIDENCE 

1. Barton MB, Elmore JG, Fletcher SW. Breast symptoms among women 
enrolled in a health maintenance organization: frequency, evaluation, 
and outcome. Ann Intern Med. 1999;130(8):651-657. 

Reviewed by Mary B. Barton, MD 


TITLE Predictors of Sensitivity of Clinical Breast Exami¬ 
nation. 

AUTHORS Oestreicher N, White E, Lehman CD, Man- 
delson MT, Porter PL, Taplin SH. 

CITATION Breast Cancer Res Treat. 2002;76(1):73-81. 

QUESTION What factors influence the sensitivity of the 
clinical breast examination (CBE) in screening for breast 
cancer? 

DESIGN Analysis of data linkage between a breast can¬ 
cer screening program involving both CBE and mammog¬ 
raphy and a population-based cancer registry. 

SETTING Breast cancer screening program of a large 
health maintenance organization in Washington State. 

PATIENTS Women who had undergone screening and 
who were diagnosed with a first breast cancer within 12 
months of the screening examination were potentially eli¬ 
gible to be included (n = 474). Four of these women were 
excluded because of the presence of breast implants and 1 
each because of symptoms at the screening visit and at the 
request of the patient. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

CBE technique is not described, but the training of examin¬ 
ers is described and the authors state that examiners were 
CBE-certified by the American Cancer Society. Concurrent 
mammography was provided in all CBEs, but in most cases, 
the results of the mammogram were not available to the 
examiner. Breast cancer diagnoses were determined from the 
Surveillance, Epidemiology and End-Results cancer registry 
of Seattle-Puget Sound. 

MAIN OUTCOME MEASURE 

Sensitivity of CBE. 

MAIN RESULTS 

The sensitivity of the breast examination was 0.35 (95% Cl 
0.31-0.39). The authors found in multivariable analyses that 
CBE sensitivity was significantly higher for women with 
larger tumors at diagnosis, for Asian women compared with 
white women, and for women with normal body mass index 
or weight compared with women with increased body mass 
index. The sensitivity of CBE was lower in women at 
extremes of age (ie, 40-49 years or > 80 years) compared with 
that in women aged 50 to 59 years. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4 (“sensitivity-only” study). 

STRENGTHS This study used a comprehensive breast can¬ 
cer screening program in a stable managed-care population 
and linked these data to a population-based cancer registry 
to ascertain cancer outcomes of women screened. 

LIMITATIONS Although the data were somewhat old (all 
cancers diagnosed 8 or more years before the date of publica¬ 
tion), the technique of CBE had not changed during that 
time. Because of the nature of the analysis, the study could 
confidently assess sensitivity of CBE only among women in 
whom cancers were diagnosed and could not assess the spec¬ 
ificity or the positive predictive value of CBE. 

The authors observed that their study is one of effective¬ 
ness, not efficacy, in comparing their findings with those of 
the Canadian NBBS studies. Although this may be true, one 
imagines that, even in an actual clinical setting, the use of 
standardized best practice procedures in the performance of 
the CBE could not hurt, and might help the performance 
characteristics of the CBE. 

Reviewed by Mary B. Barton, MD 
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CASE 1 A 50-year-old man undergoes a general physical 
examination for his insurance policy. A left-sided, focal, 
systolic carotid bruit is identified. There is no history of 
stroke or transient ischemic attack (TIA). 

CASE 2 A 50-year-old man undergoes a preoperative 
examination the evening before he is to undergo coronary 
artery bypass surgery. A bruit identical to that found in 
the first patient is heard. There is no history of cerebrovas¬ 
cular symptoms. 

CASE 3 A 50-year-old man presents to the emergency 
department with a history of a transient (less than 1 hour) 
slurring of speech and right-arm weakness. There is no 
history of cerebrovascular disease, and the physical exami¬ 
nation reveals a focal, left-sided, systolic carotid bruit. 


THE IMPORTANCE OF CLINICAL EXAMINATION 


The clinical significance of the identical-sounding bruits is 
vastly different in these patients. In each of them, the coup¬ 
ling of a thoughtful history with a competent physical exami¬ 
nation will lead to different prognostic predictions and 
differing courses of appropriate clinical action. 

THE CAROTID ARTERY AS A CAUSE 
FOR BRUITS IN THE NECK 

The right common carotid artery arises from the brachio¬ 
cephalic artery (the first branch of the aortic arch), and the 
left arises directly from the aortic arch. The common carotid 
arteries run upward and backward through the neck, from 
the sternoclavicular joint to the upper border of the thyroid 
cartilage, where they divide into the external and internal 
carotid arteries (Figure 9-1). The external carotid artery ter¬ 
minates in the substance of the parotid gland, where it 
divides into the superficial temporal and mandibular arter¬ 
ies. The internal carotid artery ascends to the base of the skull 
and enters the cranium through the carotid canal in the tem¬ 
poral bone. 

Although bruits of the carotid artery have been reported in 
approximately 20% of children younger than 15 years, they 
occur in about 1% of healthy adults. 1 Carotid bruits can be 
heard in states of increased vascular flow such as thyrotoxico¬ 
sis, anemia, and arteriovenous fistulas. A relatively common 
example of the latter occurs with the creation of a forearm 
fistula in patients receiving hemodialysis. 2 In a convenience 
sample of 15 long-term hemodialysis patients, Messert et al 2 
found bilateral carotid bruits in 5 patients and a unilateral 
bruit in 6 patients. The bruit was usually louder on the side 
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Figure 9-1 Anatomy of the Right Carotid Artery 

Carotid bruits are heard best in the polygonal area (shaded in blue). This area is 
bounded superiorly by the angle of the jaw, interiorly by the upper border of the 
thyroid cartilage, and posteriorly by the sternocleidomastoid muscle. 


of the fistula and was often associated with a subclavian bruit 
(in 13 of 15 patients). Carotid artery stenosis, typically 
caused by atherosclerosis, is the underlying condition to be 
considered when one hears a carotid bruit, and the accuracy 
of this sign is discussed below. However, a bruit may be heard 
over the bifurcation of the carotid artery when the associated 
angiogram shows either a normal or a completely occluded 
internal carotid artery; in these cases, the bruit may arise 
from a stenosed external carotid artery. 3 

HOW TO HEAR CAROTID BRUITS 

In a quiet room, with the patient relaxed, it is conventional 
to use the bell of the stethoscope and to listen for carotid 
bruits over an area beginning from just behind the upper 
end of the thyroid cartilage to just below the angle of the 
jaw (Figure 9-1). 4,5 No method of auscultation has been 
demonstrated to be superior to another. Most carotid bruits 
are heard only in systole, but some are heard in both systole 
and diastole, the significance of which is unclear, given the 
poor clinical agreement on the assessment of the duration 
of carotid bruits. 6 

Carotid bruits make up but a portion of all neck bruits. 
Systolic heart murmurs transmitted to the neck usually can 
be differentiated from carotid bruits because they are louder 
over the precordium than over the neck. 


Venous hums, caused by flow in the internal jugular vein, 
have been reported to occur in approximately 25% of young 
adults. 7 They are easily distinguishable from carotid bruits, 
being most prominent in diastole, with the patient sitting 
and the head turned away from the side of auscultation. 
Venous hums are rarely heard with the patient lying down 
and are always abolished either by the compression of the 
ipsilateral internal jugular vein cephalad to the stethoscope 
or by Valsalva maneuver. 8,9 

PRECISION OF AUSCULTATION FOR CAROTID BRUITS 

Among 55 patients examined independently by 2 neurolo¬ 
gists (both of whom had normal audiogram results), the 
agreement beyond chance for the presence of a bruit was 
substantial, with a K of 0.67. However, agreement regard¬ 
ing the intensity, pitch, or duration of the bruit was only 
fair (k < 0.40). 6 

THE IMPORTANCE OF CAROTID BRUITS IN 
DIFFERENT CLINICAL PRESENTATIONS 

Case 1: The Asymptomatic Ambulatory Bruit 

How Often Should We Expect to Find an 
Asymptomatic Carotid Bruit? 

In a community-based study, Heyman et al 10 found the prev¬ 
alence of asymptomatic cervical bruits (bruits heard in the 
supraclavicular area or anterior to the sternocleidomastoid 
muscle) to increase with age, from 2.3% in the age group of 
45 to 54 years to a high of 8.2% in the age group of 75 years 
or older. Bruits were more common in women and hyperten¬ 
sive patients. 

If No Bruit Is Found at This Examination, What Are 
the Chances of Developing a Bruit De Novo 
During the Following Years? 

The incidence of de novo bruits also increases with age. Wolf 
et al 11 estimated that of a cohort of 100 adults aged 65 years 
or older, approximately 1% per year (7 during the next 8 
years) will develop a new carotid bruit, a rate twice that of 
individuals aged 45 to 54 years. 

What Are the Prognostic Implications of Discovering an 
Asymptomatic Carotid Bruit During a General Physical 
Examination in a 50-Year-Old Man? 

Asymptomatic carotid bruits are associated with increased 
incidence of both cerebrovascular and cardiac events in this 
age group. For example, Wiebers et al 12 conducted a 5-year 
prospective, population-based study of 2 unmatched but 
generally similar cohorts, one of which had carotid bruits 
(566 individuals) and one of which did not (428 individuals). 
The average annual stroke rates were 3 times as high in 
patients with bruits (1.5%) compared with those without 
(0.5%), and similar ratios were also found for TIAs (0.9% vs 
0.2%). Most strokes and TIAs occurred on the same side as 
the bruit. The prognosis was not different for the various 
types of carotid bruits (diffuse vs localized, isolated systolic 
vs systolic and diastolic). In a second prospective, popula- 
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tion-based cohort, Heyman et al 10 followed up 1620 asymp¬ 
tomatic adults aged 45 years or older for 6 years and again 
found a higher incidence of strokes in patients with cervical 
bruits (odds ratio [OR], 4.2). The association appeared 
stronger in men (OR, 7.5) than in women (OR, 1.6). Hey¬ 
man et al 10 also found a 3.4-fold higher risk of death from 
ischemic heart disease in men with asymptomatic cervical 
bruits (90% confidence interval [Cl], 1.1-11), and a 1.9-fold 
higher risk in women (90% Cl, 0.7-5). 

A randomized trial of carotid endarterectomy in asymp¬ 
tomatic carotid stenoses of at least 50% reported a decrease 
in TIAs after surgery. 13 However, there was no decrease in dis¬ 
abling or fatal stroke after surgery, and most clinicians would 
not refer such patients for angiography. 

In the elderly (older than 75 years), there may not be an 
increased risk of stroke with asymptomatic carotid bruits. 
Among nursing home residents, the 3-year incidence of 
TIA or stroke was 10% when a bruit was present and 9% 
when it was absent, a relative risk of only 1.1 (95% Cl, 
0.45-2.7). 14 

Case 2: The Asymptomatic Preoperative Bruit 

How Often Should We Expect to Find an Asymptomatic 
Carotid Bruit on Routine Preoperative Assessment? 

The prevalences reported in the 4 surgical cohort studies that 
assessed for the presence of bruits preoperatively range from 
a low of 6% (Ivey et al 15 ) to a high of 16% (Evans and 
Cooperman 16 ), with an overall average of approximately 
10%. These figures are significantly higher than those in the 
general population (average, 4.4%), probably because 3 of 
the 4 surgical series were patients undergoing major vascular 
procedures, in which the prevalence of systemic atherosclero¬ 
sis is increased. 

Are Patients With Asymptomatic Preoperative Bruits at 
Higher Risk of Perioperative Stroke? 

As shown in Table 9-1, only Barnes et al 17 of the 4 studies 15 ' 18 
found an increased incidence of permanent neurologic com¬ 
plications after surgery among patients with preoperative 
asymptomatic carotid bruits. When combined with the other 
3 studies, the difference becomes a nonsignificant trend 
favoring fewer strokes among patients with carotid bruits 
(pooled rate difference, 19 -0.6% [95% Cl, -1.6% to 0.4%]; 
pooled OR, 20 0.94 [95% Cl, 0.22-3.9]). 


On the other hand, Ivey et al 15 found an increase (11% vs 
2%; P < .001) in transient, nonfocal neurologic abnormalities 
(such as intellectual and behavioral changes) in patients with 
asymptomatic bruits who underwent cardiac procedures. 

Case 3: The Symptomatic Bruit 

Should Further Diagnostic or Therapeutic Procedures Be 
Carried Out in Patients With Symptomatic Carotid Bruits? 

Two randomized controlled trials 21,22 demonstrated that 
carotid endarterectomies markedly decrease mortality and 
stroke in patients with symptomatic, high-grade (70%-99%) 
carotid stenosis. Accordingly, the onus is on the physician to 
rule in or rule out high-grade carotid stenosis in all patients 
with anterior-circulation TIAs or minor strokes, regardless of 
bruits. 

Does the Presence or Absence of a Carotid Bruit 
Accurately Reflect the Degree of Underlying 
Carotid Artery Stenosis in Symptomatic Patients? 

The relationship between carotid bruits in patients with cere¬ 
brovascular symptoms and angiographically determined 
carotid stenoses is summarized in Table 9-2. 23 ' 26 The 2 studies 
that reported data specifically about high-grade stenoses 
found an association with carotid bruits. 25,26 The likelihood 
ratios for high-grade carotid stenoses were 3.2 and 1.6 when 
bruits were present and 0.3 and 0.6 when bruits were absent. 

Unfortunately, however, this relationship is not strong 
enough for the clinician to be able to use the presence of a 
bruit to rule in, or the absence of a bruit to rule out, high- 
grade carotid stenosis. For example, in the North American 
Symptomatic Carotid Endarterectomy Trial (NASCET), 21 
more than one-third of patients with high-grade stenoses 
had no detectable bruits, and the presence of a focal carotid 
bruit increased the probability of underlying high-grade 
(70%-99%) carotid stenosis by only 11%, from a preexami¬ 
nation probability of 52% to a postexamination probability 
of 63%. Furthermore, the NASCET also showed that no 
other bruits (supraclavicular, ophthalmic, or contralateral) 
added to the accuracy of the finding. 

THE BOTTOM LINE 

1. Asymptomatic carotid bruits are relatively common. 
Their prevalence increases with age. They are associated 


Table 9-1 Risk of Perioperative Stroke in Patients With Preoperative Carotid Bruits 



Studies 

Types of Patients 

No. of Patients With Perioperative Stroke/ 
Total No. of Patients With Bruits (%) 

No. of Patients With Perioperative Stroke/ 
Total No. of Patients Without Bruits (%) 

pa 

Barnes et al 17 

Coronary artery bypass graft 
and vascular surgery 

2/44 (4.5) 

3/405 (0.7) 

.02 

Evans and Cooperman 16 

Major vascular surgery 

0/92 (0) 

4/496 (0.8) 

.39 

Ivey et al 15 

Cardiac surgeries 

0/82 (0) 

9/1339(0.7) 

.46 

Ropper et al 18 

All elective surgeries for 
those >55 y 

0/82 (0) 

4/592 (0.7) 

.46 


“Using the y 2 test. 
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Table 9-2 Ability of Carotid Bruits to Indicate Various Degrees of Angiographic Carotid Stenosis in Patients With Symptoms 



Studies 

Types of Patients 

Degree of Stenosis Predicted, % 

Sensitivity 

Specificity 

LR+ 

Pretest P 

Posttest P 

Ingall et al 23 

Various symptoms 

50-99 

0.37 

0.94 

5.7 

.25 

.65 

Ziegler et al 24 

TIA 

>50 

0.29 

0.88 

2.4 

.08 

.17 

Hankey and Warlow 25 

Anterior-circulation TIA 

75-99 

0.76 

0.76 

3.2 

.16 

.37 

North American Symptomatic 
Carotid Endarterectomy Trial 
Collaborators 26 

Anterior-circulation TIA 

70-99 

0.62 

0.61 

1.6 

.52 

.63 


Abbreviations: LR+, positive likelihood ratio; TIA, transient ischemic attack. 


with a long-term increase in cerebrovascular and cardiac 
events, except perhaps in individuals older than 75 years. 

2. Asymptomatic preoperative bruits are not predictive of 
increased risk of perioperative stroke. However, they may 
be harbingers of transient postoperative cognitive and 
behavioral abnormalities. 

3. Although the presence of a carotid bruit in a patient with 
carotid-territory cerebrovascular symptoms increases the 
probability that the underlying stenosis is high grade (and 
therefore amenable to endarterectomy), the accuracy of 
this physical finding is low. Accordingly, the presence of a 
carotid bruit cannot be used to rule in, nor can its absence 
be used to rule out, surgically amenable carotid artery ste¬ 
nosis in symptomatic patients. 
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CLINICAL SCENARIO 


A 65-year-old woman returns to your office to review 
her home blood pressure recordings. She frequently has 
a systolic pressure of approximately 130 to 145 mm Hg. 
While examining her, you slip the stethoscope onto her 
neck and hear a focal, unilateral bruit. The patient 
notices a change in your facial expression while you are 
listening to her neck, so she asks, “Did you hear some¬ 
thing?” You tell her you heard a “squeaky noise” and 
then immediately wonder whether she (or you) needed 
that information. You realize you need to know whether 
the presence of a bruit suggests that the patient might 
have a carotid stenosis severe enough to warrant a sur¬ 
gical evaluation. She reminds you that her father, after 
being healthy all his life, had a stroke when he was 72 
years old. 


UPDATED SUMMARY ON THE CLINICAL EXAMINATION 
FOR CAROTID BRUITS 

Original Review 

Sauve JS, Laupacis A, Ostbye T, Feagan B, Sackett DL. Does 
this patient have a clinically important carotid bruit? JAMA. 
1993;270(23):2843-2845. 

UPDATED LITERATURE SEARCH 

Much has been written on carotid disease, particularly the 
role of carotid endarterectomy for individuals with symp¬ 
toms of a stroke, transient ischemic attack (TIA), or tran¬ 
sient monocular blindness. However, there are also new 
data for individuals who are asymptomatic. We focused our 
updated literature review on the role of the carotid bruit in 
detecting patients who have an ipsilateral carotid stenotic 
lesion because these patients might benefit from carotid 
endarterectomy. 

Our literature search included the years 1992 through 
July 2004 and combined the text words “bruit and carotid” 
with “asymptomatic and carotid” to yield 85 English-lan¬ 
guage articles. We excluded case reports and then reviewed 


the abstracts of 76 articles to identify 24 promising articles. 
When possible, electronic copies of the articles were 
obtained and searched for the text word “bruit.” Articles 
were retained when they were prospective studies of adults 
that included both sensitivity and specificity data of level 3 
quality or greater. We also retained articles that were stud¬ 
ies of the positive predictive value of carotid bruits at a 
threshold of at least 70% stenosis because these are the 
patients who will likely benefit from endarterectomy. The 
reference lists for each article were reviewed, yielding 1 
additional study. The reference list of the original Rational 
Clinical Examination article was reviewed, and previously 
cited literature was obtained to assess whether the data 
could be reanalyzed. Eight articles were ultimately included 
in this update. 

NEW FINDINGS 

Symptomatic Patients 

The presence of a carotid bruit increases the likelihood of a 
70% to 99% carotid stenosis (likelihood ratio [LR], 3). How¬ 
ever, newer studies confirm that the absence of a bruit is not 
sufficient to prove that the carotids are normal. 

Asymptomatic Patients 

Newer studies allow us to deduce that the presence of a bruit 
in asymptomatic patients appreciably increases the likelihood 
of carotid stenosis (LR, 4-10). 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

Confidence intervals were added by retrieving original refer¬ 
ences and extracting the results. We also reviewed cited stud¬ 
ies to assess whether they had information about the 
predictive value for bruits. 

CHANGES IN THE REFERENCE STANDARD 

A meta-analysis of noninvasive carotid artery tests showed 
that carotid duplex, carotid Doppler, and magnetic reso- 
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nance angiography had excellent sensitivity (89%-94%) 
and specificity (85%-92%) for detecting carotid stenosis of 
70% occlusion or greater, compared with carotid angiogra¬ 
phy. 1 This makes them useful as the next screening test after 
the clinical examination, when more information is 
required. 

RESULTS OF LITERATURE REVIEW 

Symptomatic Patients 

Studies now address the role of carotid bruits in predicting 
carotid stenosis for a broader array of both symptomatic 
and asymptomatic patients ( le 9-3). For symptomatic 
patients, the studies were done in enough detail to allow us 
to estimate the sensitivity and specificity of the carotid 
bruit for predicting a surgical stenosis. These studies dem¬ 
onstrate much better specificity than sensitivity and that 
the presence of a carotid bruit increases the likelihood of a 
stenotic lesion in symptomatic patients. However, the 
newer information confirms that the absence of a bruit in 
symptomatic patients is not adequate to prove that the 
carotids are normal. 


Table 9-3 Results for Predicting a Significant Carotid Stenosis 
of 70% to 99% 

Study 

Stenosis, 

% 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Mead et al 6 (symptomatic, referred for 
neurology evaluation) 

70-99 

5.5 

(4.1-7.2) 

0.48 

(0.38-0.60) 

Hankey and Warlow 2 (symptomatic, 3 
referred for neurology evaluation for 
endarterectomy) 

75-99 

C\J 

c\j ^r 

CO 

C\j, 

0.31 

(0.18-0.50) 

Sauve et al 3 (symptomatic, enrolled in 
endarterectomy trial) 

70-99 

1.6 

(1.4-1.8) 

0.61 

(0.54-0.68) 

Magyar et al 7 (57% had symptoms; 
referred to neurology clinic) 

70-99 

6.0 

(3.2-10) 

0.48 

(0.25-0.74) 

Hill et al 8 (asymptomatic before cardiac 
surgery) 

>80 

8.6 

(4.3-15) 

0.24 

(0.07-0.60) 

de Virgilio et al 9 (asymptomatic referred 
for peripheral vascular disease evaluation) 

50-99 

4.2 

(2.3-7.2) 

0.55 

(0.34-0.77) 

Summary LRs 

Symptomatic (n = 3 studies," 

2292 patients) 


3.0 

(1.3-7.1) 

0.49 

(0.36-0.67) 

Asymptomatic (n = 2 studies," 

275 patients) 


6.0 

(2.6-14) 

0.45 

(0.22-0.92) 

Asymptomatic deduced from the 
positive predictive values (Table 9-4) 


4.0-10 

Uncertain 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 

“All patients were symptomatic. After the arteriogram, 298 carotid arteries were consid¬ 
ered “symptomatic” and 124 arteries were considered “asymptomatic." Retrospectively, 
bruits were heard in 95 of 298 symptomatic arteries and 41 of 124 asymptomatic arter¬ 
ies. Data are not provided to allow calculation of separate sensitivity and specificity. 
“Mead et al, 6 Hankey and Warlow, 2 and Sauve et al. 3 
“Hill et al 8 and de Virgilio et al. 9 


An interesting finding is revealed by looking at 2 of the 
studies cited in the original Rational Clinical Examination 
article. These studies 2 - 3 included the most selective popula¬ 
tion of patients for whom the sensitivity and specificity were 
reported. The study by Hankey and Warlow 2 included only 
those symptomatic patients who were suitable candidates for 
endarterectomy. The North American Symptomatic Carotid 
Endarterectomy Trial 4 included only patients with carotid 
stenosis, and then only those who were randomized to 
endarterectomy versus medical therapy. These studies exhibit 
verification bias, which typically creates underestimates of 
the specificity and value of hearing a carotid bruit. Thus, it 
should not be surprising that these studies also have the low¬ 
est positive likelihood ratio (LR+) of those we reviewed 
(Table 9-3). On the other hand, this reappraisal confirms that 
the absence of a bruit in symptomatic patients does not have 
enough diagnostic power in the symptomatic patient to rule 
out an important stenotic lesion. 

Asymptomatic Patients 

By asymptomatic, we mean asymptomatic for cerebrovascu¬ 
lar disease. Two studies, one in preoperative cardiac patients 
and the other in peripheral vascular disease patients, allow us 
to calculate both the sensitivity and specificity of the carotid 
bruit. Although the studies used slightly different thresholds 
to characterize patients as having carotid stenosis, the predic¬ 
tive values for bruit are statistically similar among the 
asymptomatic studies ( le 9 ). However, we can also look 
at the predictive value for the presence of a bruit and deter¬ 
mine whether it varies (Table 9-4), which is useful because 
the studies that allowed us to calculate sensitivity and speci¬ 
ficity may not generalize to an age-matched general medical 
patient. The positive predictive value for symptomatic 
patients is approximately 50% and about half that (22%) for 
patients with no cerebrovascular symptoms. Because we 
know the predictive value, we can make inferences about the 
LR+. This follows from the equation: 

Posterior odds = Prior odds x LR 

From epidemiologic studies, the prevalence should range 
from approximately 0.5% for patients aged 50 years or older 
to approximately 10% for patients aged 80 years or older. 5 
These values establish a range of reasonable prior odds. The 
data from the positive predictive value studies allow us to 
develop a range for the posterior odds. We can then solve for 
the LR for both symptomatic patients (50% posterior proba¬ 
bility) and asymptomatic patients (22% posterior probabil¬ 
ity) (Figure 9-2). 

The likelihood ratio for a bruit to predict significant 
carotid stenosis varies with the prior probability. Figure 9-2 
shows that as the prior probability of stenosis increases (x- 
axis), the importance of a carotid bruit becomes less. If your 
population of asymptomatic patients is recognizably similar 
to those who were included in the baseline summary esti¬ 
mate from Table 9-4, then you would use the asymptomatic 
probability line and see that across a reasonable range of 
prior probabilities (about 3%-8% on the x-axis) for carotid 

























CHAPTER 9 Carotid Bruit 


Table 9-4 Results for the Positive Predictive Value for the Presence of a Bruit in Predicting Ipsilateral Carotid Stenosis 


Study 

Degree of 
Stenosis, % 

Stenosis/ 

All Patients 

Positive Predictive Value, 

% (95% Cl) 

Sauve et al 3 (symptomatic, part of endarterectomy trial) 

70-99 

420/667 

63 (60-68) 

Mead et al 6 (symptomatic, referred to neurologist) 

70-99 

54/119 

45 (36-55) 

Hankey and Warlow 2 (symptomatic, referred to neurologist for evaluation of endarterectomy) 

75-99 

35/95 

37 (28-47) 

Hill et al 8 (asymptomatic before cardiac surgery) 

>80 

7/23 

30 (16-51) 

Chambers and Norris 10 (asymptomatic referred for bruit) 

>75 


23 (19-26) 

Lewis et al" (asymptomatic referred for bruit) 

80-99 


21 (18-24) 

Summary Predictive Value 

Symptomatic (n = 3 studies, 868 patients) 3 

70-99 


50 (35-64) 

Asymptomatic (n = 3 studies, 1303 patients) 8 

>75 


22 (20-24) 


Abbreviation: Cl, confidence interval. 

“Studies combined from Mead et al, 6 Sauve et al, 3 and Hankey and Warlow. 2 

“Studies combined from Hill et al, 8 Chambers and Norris, 18 and Lewis et al. 11 The results are homogenous (P= .42). 


stenosis, finding a bruit has a useful LR of 4 to 10 (from the 
y-axis). Fortunately, we can feel more confident about this 
because the results are similar to the summary LRs for 
asymptomatic patients from Table 9-3. 

EVIDENCE FROM GUIDELINES 

Symptomatic patients with TIAs who are surgical candidates 
should be evaluated for carotid stenosis, whether or not they 
have a bruit. 12 

The US Preventive Services Task Force reviewed screening 
for asymptomatic carotid artery stenosis in 1996 and found 
insufficient evidence to make a recommendation about lis¬ 
tening for carotid bruits. 5 The Task Force observed that the 
annual incidence of stroke unheralded by any TIA symptoms 
ipsilateral to a bruit is 1% to 3%. The interpretation of data 
presented in this update were not available to the Task Force 
and have not been incorporated into the 1996 recommenda¬ 
tions. There are still no data that assess the effect of screening 
for an asymptomatic bruit, confirming stenosis, and then 



Figure 9-2 Likelihood Ratio of Carotid Bruit as a Function of Symp¬ 
toms and Prior Probability of Stenosis 

The likelihood ratio of a carotid bruit in predicting carotid stenosis depends on 
whether the patient is symptomatic or asymptomatic and on the prior probability 
of carotid stenosis. However, for both groups of patients the positive likelihood 
ratio decreases in value as the prior probability of carotid stenosis increases. 


performing an endarterectomy on patients with surgically 
significant lesions. The Canadian Task Force recommended 
that clinicians not listen for carotid bruits in asymptomatic 
patients. 13 There does seem to be consensus that the pres¬ 
ence of an asymptomatic bruit is a marker of atheroscle¬ 
rotic risk. 


CLINICAL SCENARIO—RESOLUTION 


You listened for a bruit with the plan that you would 
emphasize risk-reduction strategies for your hypertensive 
patient, but now, she has asked you to use the findings to 
help decide whether to assess her for carotid stenosis. A 
variety of studies suggest that the LR+ for carotid stenosis 
when a bruit is heard is 4 to 10. Let us say you estimate 
that her prior probability of carotid stenosis is approxi¬ 
mately 3%, which agrees with epidemiologic data. Find¬ 
ing a bruit increases her probability of carotid stenosis to 
approximately 11%, but it might be as high as 20%. 
Hence, you probably have identified a patient at higher 
risk of carotid stenosis. The issue is not whether you can 
identify stenosis with ultrasonography, but whether you 
should. Studies of diagnostic tests give you only the likeli¬ 
hood of the target disorder. You will need to review the 
natural history of patients with asymptomatic carotid ste¬ 
nosis to help this patient decide whether to pursue further 
testing. 
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CHAPTER 9 Update 


CAROTID STENOSIS—MAKE THE DIAGNOSIS 


It is hard for physicians to resist auscultating the neck. Per¬ 
haps no physical finding in adults causes as much confusion 
as the presence of the carotid bruit in asymptomatic patients. 
Most clinical research suggests that there is a clear benefit to 
carotid endarterectomy for patients with symptoms and a 
benefit (although likely small) for asymptomatic patients. 

PRIOR PROBABILITY FOR CAROTID STENOSIS 

Symptomatic Patients 

Prior Probability 

After ruling out patients for whom endarterectomy would 
not be considered, 10% to 30% will have surgically amena¬ 
ble carotid stenosis. There is variability in the estimates of 
the remaining patients who will prove to have surgically cor¬ 
rectable carotid stenosis. The variability depends on the 
patient population, criteria for determining surgical risk, 
and the threshold for defining an “important” stenosis. 

Asymptomatic Patients 

Prior Probability 

For patients 60 years or older, there is 1% to 10% probability 
for carotid stenosis. 

The prevalence of carotid stenosis increases from approxi¬ 
mately 0.5% for patients 50 years of age to approximately 
10% by age 90 years. 14 For patients older than 65 years, 5% 
to 7% of women and 7% to 10% of men will have a carotid 
stenosis of 50% or higher. For more significant degrees of 
stenosis, 2 prospective, population-based samples show that 
1% to 2.3% of women and 1% to 4.1% of men older than 60 
years will have a stenosis of 75% to 99%. 15,16 


POPULATION FOR WHOM THE CAROTID 
BRUIT MIGHT BE AUSCULTATED 

• Patients with cerebrovascular symptoms compatible with 
a nondebilitating stroke or TIA 

• Older patients, as part of an assessment for cardiovascular risk 

DETECTING THE LIKELIHOOD OF CAROTID STENOSIS 

The presence of a carotid bruit does increases the likelihood 
of an important stenotic lesion, but the absence of a bruit 
(especially in patients with atherosclerotic risk factors) does 
not rule out carotid stenosis (see Tables 9-5 and 9-6). 


Table 9-5 Do Carotid Bruits Predict Stenosis in 

Symptomatic Patients? 


LR for Carotid Stenosis, 

70%-99% (95% Cl) 

Ipsilateral bruit 

3.0 (1.3-7.1) 

No ipsilateral bruit 

0.49 (0.36-0.67) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

Table 9-6 Do Carotid Bruits Increase the Likelihood of Carotid 
Stenosis in Asymptomatic Patients? 


LR for Carotid Stenosis, 70%-99% 

Ipsilateral bruit 

4.0-10 

No ipsilateral bruit 

Uncertain 


Abbreviation: LR, likelihood ratio. 
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EVIDENCE TO SUPPORT THE UPDATE 


Carotid Bruit 



TITLE Outcome in Patients With Asymptomatic Neck 
Bruits. 

AUTHORS Chambers BR, Norris JW. 

CITATION NEnglJMed. 1986;315(14):860-865. 

QUESTION Does a bruit predict the presence or absence 
of carotid stenosis? 

DESIGN Baseline data collected as part of a prospective 
cohort of patients enrolled in a study of asymptomatic 
neck bruits. 

SETTING Single site, stroke unit in Toronto. 

PATIENTS Among 659 patients referred for Doppler 
ultrasonography, 500 were asymptomatic and were 
enrolled in a prospective cohort. 

The patients include those in whom physicians might 
consider the presence of carotid stenosis. They had a 
mean age of 64 years, 74% were men, 58% had hyperten¬ 
sion, 58% had heart disease, 57% had peripheral vascular 
disease, 13% had diabetes, 73% had smoking history or 
currently smoked, and 35% had hypercholesterolemia. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients were examined at enrollment. Carotid Doppler 
ultrasonography was performed without knowledge of the 
auscultatory findings. The ultrasonographers had demon¬ 
strated proficiency when their findings were compared with 
angiography. 

MAIN OUTCOME MEASURE 

Positive predictive value at different degrees of stenosis. 


MAIN RESULTS 

See able 9-7. 


Table 9-7 The Predictive Value of a Carotid Bruit for Identifying Various 
Levels of Carotid Stenosis 

Stenosis, No. 

Positive Predictive Value of a 

(Degree of Stenosis, %) 

Carotid Bruit (95% Cl) 

113 (>75) 

23 (19-26) 

157(30-74) 

31 (27-36) 

230 (0-29) 

46 (42-50) 


Abbreviation: Cl, confidence interval. 


CONCLUSIONS 

LEVEL OF EVIDENCE Positive predictive value studies. 

STRENGTHS Prospective with careful screening and con¬ 
firmed proficiency of ultrasonographers. 

LIMITATIONS The results generalize only to populations 
with the same prevalence of carotid stenosis among patients 
with carotid bruits. No patient who lacked a carotid bruit 
was included, so the sensitivity and specificity cannot be 
determined. 

This study included a large cohort of asymptomatic 
patients, evaluated solely because they had a bruit. The 
cohort seems typical of a group of patients at risk for cere¬ 
brovascular or atherosclerotic disease. To apply these data to 
your own patients, you would need to know whether the 
study patients were similar to your patients because the pre¬ 
dictive value is affected by the prevalence of disease. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Do Carotid Bruits Predict Disease of the Internal 
Carotid Arteries? 

AUTHORS Davies KN, Humphrey PRD. 

CITATION Postgrad Med}. 1994;70(824):433-435. 

QUESTION Do bruits identify patients with carotid ste¬ 
nosis? 

DESIGN Prospective, consecutive patients. 

SETTING Single site, cerebrovascular clinic in the 
United Kingdom. 

PATIENTS All patients were referred for evaluation. 
The underlying prevalence of cardiovascular risk factors is 
not described. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The presence of a bruit was taken from the referral note but 
was not confirmed at study entry. The history was confirmed 
in regard to symptoms. 

MAIN OUTCOME MEASURE 

Carotid stenosis of 70% to 99%. 

MAIN RESULTS 

See Table 9-8. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Pragmatic study from the perspective of a vas¬ 
cular laboratory that would take the information from the 
referral note. 

LIMITATIONS The presence of a bruit was not confirmed in 
a standardized manner. It is not stated whether the ultra¬ 
sonography was done blinded to the clinical findings. 

Although interesting from the perspective of clinicians in a 
vascular laboratory, the presence or absence of a bruit was not 
systematically confirmed by the study clinicians. The data were 


Table 9-8 

Likelihood Ratio of a Carotid Bruit for Carotid Stenosis of at 

Least 70% 




Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Bruit 

0.57 

0.70 

1.9(1.4-2.6) 

0.61 (0.41-0.83) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


taken from the referral requests, which may not have been 
consistently thorough. Thus, it is likely that some patients 
recorded as not having a bruit may have actually had a cervical 
bruit and vice versa. 

Reviewed by David L. Simel, MD, MHS 


TITLE Asymptomatic Carotid Artery Stenosis Screening 
in Patients With Lower Extremity Atherosclerosis: A Pro¬ 
spective Study. 

AUTHORS de Virgilio C, Toose K, Arnell T, Lewis RJ, 
Donayre CE, Baker JD, Melany M, White RA. 

CITATION AnnVascSurg. 1997;ll(4):374-377. 

QUESTION Does a bruit predict ipsilateral carotid ste¬ 
nosis among patients with peripheral vascular disease 
who have no cerebrovascular symptoms? 

DESIGN Prospective. 

SETTING Vascular surgery clinic, West Los Angeles Vet¬ 
erans Affairs medical center. 

PATIENTS Men (n = 89) who were referred for surgical 
evaluation for peripheral vascular disease. Patients were 
excluded if they had any symptoms of cerebrovascular disease. 
Ninety percent of the patients had typical claudication, 88% 
were smokers, 60% had hypertension, and 42% had diabetes. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Auscultation of the carotids and a carotid duplex ultrasonog¬ 
raphy were performed on each carotid by a radiologist 
blinded to the clinical status of the patient. 

MAIN OUTCOME MEASURE 

Presence of carotid stenosis greater than 50%. Data are pre¬ 
sented for numbers of arteries imaged (n = 178). 

MAIN RESULTS 

See able 9-9. Of 89 patients, 18 had a bruit (in 14 of 18, the 
bruit was bilateral). Of 32 carotid arteries with bruits, 13 had 
a stenosis of at least 50%. This study used a threshold value 
different from those used by other studies on the sensitivity 
and specificity for a carotid bruit. However, traditionally we 
like to think of the screening test as having the same sensitiv¬ 
ity and specificity independent of the prevalence. Likelihood 
ratios (LRs) for this study are similar to those among asymp¬ 
tomatic cardiac surgery patients. 
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TITLE Prospective Evaluation of Carotid Bruit as a Pre¬ 
dictor of First Stroke in Type 2 Diabetes: The Fremantle 
Diabetes Study. 

AUTHORS Gillett M, Davis WA, Jackson D, Bruce DG, 
Davis TME. 

CITATION Stroke. 2003;34(9):2145-2151. 

QUESTION Among patients with diabetes who are 
asymptomatic for cerebrovascular ischemia, does the 
presence of a carotid bruit identify those who will have 
stroke? 

DESIGN Prospective, observational study of the natural 
history of diabetes. Patients had a baseline assessment and 
then yearly follow-up (recruitment, 1993-1996; follow¬ 
up, until 2000) or until they had a qualifying event. The 
mean follow-up was 6.5 ± 2.2 years. 

SETTING Community based in Fremantle, Western 
Australia. 

PATIENTS Patients in a defined region of Australia were 
recruited from the community to participate in the Fre¬ 
mantle Diabetes Study. The current study includes 1181 
patients from the registry who had no history of cere¬ 
brovascular disease at recruitment into the study. Fifty- 
three patients had bruits compared with 1128 patients 
without bruits. 


Table 9-9 Likelihood Ratios for a Carotid Bruit to Predict a Carotid 
Stenosis of at Least 50% 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Bruit 0.52 0.88 4.2 (2.3-7.2) 0.55 (0.34-0.77) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective study. 

LIMITATIONS Small sample size. The study used a lower 
carotid stenosis threshold (50%) than other studies for 
reporting the association with bruits. The number of arteries 
with a carotid stenosis of greater than 75% in this study was 
small (6.7%; 12 of 178). 

This is a small but sound study. The population studied 
seems typical of male patients with claudication. It is not 
clear whether the patients were consecutive patients or just 
those for whom peripheral vascular surgery was considered. 
Nonetheless, we can derive some information about the pre¬ 
dictive value in patients with claudication. 

By reporting the data at a lower threshold for defining dis¬ 
ease (50% as opposed to 75%), there should be proportion¬ 
ally more patients with disease as opposed to “normal.” This 
would not necessarily affect the sensitivity and specificity if 
the importance of a bruit is independent of the prevalence of 
disease. In fact, traditionally Bayesian analysis predicts that 
the sensitivity and specificity will not change with the preva¬ 
lence of disease. Despite using a different threshold for defin¬ 
ing carotid stenosis, the LRs were almost identical to most of 
the studies using a 70% to 75% cut point. Unfortunately, we 
cannot combine these data with studies using a different cut 
point for assessing the predictive value. 

Reviewed by David L. Simel, MD, MHS 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The presence of a carotid bruit was assessed at entry into the 
observational study. The presence of preexisting cerebrovascu¬ 
lar disease was inferred from the lack of patient symptoms or 
history of an event. At annual follow-up, qualifying events 
were determined from patient’s self-reported strokes or tran¬ 
sient ischemic attack (TIA) symptoms, or a neurologic exami¬ 
nation. Details of admissions for stroke or death were 
reviewed. It is not clear whether the assessment of a qualifying 
event was made with the knowledge of a baseline bruit. Deaths 
were reviewed without knowledge of carotid bruit status. 


MAIN OUTCOME MEASURE 

TIA or stroke. 


MAIN RESULTS 

See Table 9-10. Eighteen patients with bruits had strokes (18 of 
53; 34%) vs 116 strokes in patients without bruits at entry (116 
of 1128; 10%). Of the 18 patients with bruits and stroke, com¬ 
plete clinical data were available for 10 patients and revealed 
that 9 of 10 patients had a stroke ipsilateral to the bruit. 
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Table 9-10 Likelihood Ratios That a Bruit Predicts a Subsequent Stroke 

Test 

Outcome 

LR+ (95% Cl) 

LR- (95% Cl) 

Bruit 

Stroke in the first 

2 y after entry 
into the study 

6.6(3.6-12) 

0.78 (0.64-0.89) 

Bruit 

Stroke from 
entry to end of 
study 

4.0 (2.3-6.8) 

0.90 (0.82-0.95) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


The patients with bruits were older on entry into the study 
compared with those without bruits (mean age, 71 vs 63 
years; P < .001), had a longer history of diabetes (5.0 vs 3.8 
years; P = .009), had a higher blood pressure (mean systolic, 
164 vs 149 mm Hg; P < .001) that more frequently led to 
blood pressure treatment (76% vs 47%; P < .001), and had 
less adiposity (waist circumference, 96 vs 100 cm; P = .004). 
At entry, there was a low frequency of aspirin therapy (26% 
of those without bruits vs 19% of those without bruits). Of 
the 4.9% of patients with atrial fibrillation, only 17% without 
bruits were taking warfarin, whereas none of the patients 
with bruits were taking warfarin (P >.99). During follow-up, 
25 patients underwent carotid endarterectomy; all but 3 had 
qualifying endpoint symptoms. 

On proportional hazards modeling, there was a difference 
in the effect of risk factors for the first 2 years of enrollment 
compared with the duration of the study. From baseline to 
year 2, the important risk factors for a stroke were the pres¬ 
ence of a carotid bruit (hazard ratio [HR], 6.1; 95% confi¬ 
dence interval [Cl], 3.1-12), age (HR, 1.5 for each 10-year 
increase), and diastolic blood pressure (HR, 1.4 for each 1- 
mm Hg increase). However, after 2 years, the influence of a 
carotid bruit at baseline lost statistical significance. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Community-based study of patients who are 
asymptomatic for cerebrovascular disease but who have a 
risk factor for atherosclerotic disease (diabetes). The preva¬ 
lence of bruits in these asymptomatic patients with diabetes 
(4.5%) is approximately what we would expect in a general, 
community population. 

LIMITATIONS The assessment of previous outcomes at base¬ 
line or during follow-up (stroke or TIA) relied on patient self- 
report or the follow-up examination. Thus, not all patients 
with events were hospitalized or examined when they had 
their TLA or stroke. The clinicians would have been aware that 
the patients had bruits (or not) when assessing outcomes. 

Using a diagnostic test to establish prognosis can lead to 
errors when the prognosis depends on whether there were 
interventions. In this particular study, there may not have 
been large differences in interventions between the 2 groups 


even though there was no standardized approach to care. 
Some statisticians would take the opportunity to do a pro¬ 
pensity analysis to sort this out further and determine 
whether a bruit was associated with any treatments. 

Given these caveats, can we use these data? The notion that 
the carotid bruit may lose “importance” over time does make 
sense but needs to be confirmed in other studies and in 
patients with different atherosclerotic risk factors. An alter¬ 
native explanation may be that the stroke risk was higher 
early in the study because the patients were not at currently 
recommended levels of systolic blood pressure control. Obvi¬ 
ously, these data could apply only to patients with diabetes 
who already have other risk factors for stroke and atheroscle¬ 
rotic disease. What they seem to suggest is that carotid bruits, 
at the least, are important “by the company they keep.” 
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DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Each patient was clinically evaluated by the neurologist. The 
reference standard was carotid arteriography. 


TITLE Symptomatic Carotid Ischaemic Events: Safest 
and Most Cost Effective Way of Selecting Patients for 
Angiography Before Carotid Endarterectomy. 

AUTHORS Hankey GJ, Warlow CP. 

CITATION BMJ. 1990;300(6738):1485-1491. 

QUESTION Among patients considered for endarterec¬ 
tomy after a symptomatic cerebrovascular event, does a 
carotid bruit predict those who will have carotid stenosis? 

DESIGN Consecutive patients under evaluation for a 
carotid endarterectomy who were referred to a neurologist. 

SETTING Single site, Western General Hospital in Edin¬ 
burgh, Scotland. 

PATIENTS Four hundred eighty-five consecutive patients 
were referred for evaluation. Because a decision was made 
not to pursue possible endarterectomy, 189 patients were 
excluded, leaving 296 patients for analysis. Of the 296 
patients, 32% had a bruit, and 70% were men with a mean 
age of 61 years. The excluded patients also had a prevalence 
of 32% bruits, and 60% were men with a mean age of 70 
years. The investigators state that the decision to pursue pos¬ 
sible surgery was independent of the presence of a bruit. 
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Table 9-11 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 75% 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Ipsilateral 0.76 0.76 3.2 (2.4-4.2) 0.31 (0.18-0.50) 

bruit 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

MAIN OUTCOME MEASURE 

Carotid stenosis of 75% to 99%. 

MAIN RESULTS 

See Table 9-11. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Carotid arteriogram was the reference stan¬ 
dard test. 

LIMITATIONS The study population includes only patients 
for whom surgery was considered an option. It is unclear 
whether the presence of a bruit affected the decision to pur¬ 
sue ultrasonography, but the proportion of patients with 
bruits was the same between included and excluded groups. 

This population of patients is most similar to that reported 
from the North American Symptomatic Carotid Endarterec¬ 
tomy Trial (NASCET). 1 - 2 However, the NASCET report on 
bruits included only patients who were randomized to 
endarterectomy instead of medical treatment. The study 
reviewed here includes patients a step before that. Thus, it is 
less selective because it included patients for whom surgery 
was being considered rather than only those for whom 
endarterectomy was planned. The study was affected by veri¬ 
fication bias. However, the percentage of patients with bruits 
was identical to the percentage of patients without bruits. If 
the authors are correct that the presence of a bruit did not 
affect the decision to use arteriography, then the effect of ver¬ 
ification bias is negligible. The data also allow us to calculate 
the predictive value of a bruit for different threshold levels 
for stenosis. 


REFERENCES FOR THE EVIDENCE 

1. Sauve JS, Thorpe KE, Sackett DL, et al. Can bruits distinguish high- 
grade from moderate symptomatic carotid stenosis? Ann Intern Med. 
1994;120(8):633-637. 

2. North American Symptomatic Carotid Endarterectomy Trial (NASCET) 
Steering Committee. North American Symptomatic Carotid Endarterec¬ 
tomy Trial. Stroke. 1991;22(6):711-720. 
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TITLE The Utility of Selective Screening for Carotid Ste¬ 
nosis in Cardiac Surgery Patients. 

AUTHORS Hill AB, Obrand D, Steinmetz OK. 

CITATION J Cardiovasc Surg. 1999;40(6):829-836. 

QUESTION Among patients scheduled for cardiac sur¬ 
gery, does a carotid bruit identify those with carotid stenosis? 

DESIGN Prospective, consecutive patients. 

SETTING Single site, McGill University, Montreal. 

PATIENTS Two hundred consecutive patients who were 
scheduled for elective cardiac surgery (196 for coronary 
bypass grafting). Most of the patients were asymptomatic 
for carotid artery disease (n = 186). The data are given so 
that the results for patients with asymptomatic carotid 
bruits (n = 23) can be extracted. 

The distribution of patient characteristics suggests that 
they were typical of those undergoing coronary bypass graft¬ 
ing. Fifty percent were older than 65 years; half of all patients 
were smokers, 22% having diabetes mellitus, 31% having 
hyperlipidemia, and 20% having peripheral vascular disease. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The patients were examined before the carotid ultrasonogra¬ 
phy. The ultrasonography was done by vascular technicians 
who had proved their proficiency compared with angiography. 

MAIN OUTCOME MEASURES 

Carotid stenosis of 80% or more by duplex ultrasonography. All 
patients with a positive duplex result also had arteriography. 

MAIN RESULTS 

See ible 9-12. In a logistic model with many clinical variables, 
the neurologic history (odds ratio [OR], 14; 95% confidence 
interval [Cl], 2.9-73) and a carotid bruit (OR, 28; 95% Cl, 6.6- 
123) were the only variables that were important. 


Table 9-12 Likelihood Ratio for a Carotid Bruit to Predict Stenosis of at 
Least 80% 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Asymptomatic 0.78 0.91 8.6(4.3-15) 0.24(0.07-0.60) 

carotid bruit 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective consecutive enrollment of patients, 
primarily those asymptomatic for carotid artery disease. 
Although the study patients were all scheduled for cardiac sur¬ 
gery, the population included patients for whom carotid artery 
stenosis might be considered. It is one of the few studies that 
contain specificity data for a population of patients who are 
asymptomatic for cerebrovascular disease. A logistic regression 
was done to determine whether carotid bruits were important 
after controlling for other clinical variables. 

LIMITATIONS Small sample size. 

Despite the small sample size compared with studies of 
symptomatic patients, this is an important study. The preva¬ 
lence of carotid disease (defined as >80%) was 4.8% for indi¬ 
viduals who were asymptomatic for neurologic symptoms vs 
36% for those with symptoms. The positive predictive value 
for finding an asymptomatic bruit was 30%. 

The prevalence of carotid stenosis in this study is approxi¬ 
mately what could be expected for an age-matched popula¬ 
tion of patients with atherosclerotic disease. Although more 
studies with specificity data for the bruit in asymptomatic 
patients are needed, these results may generalize to those 
with atherosclerotic disease. 

Reviewed by David L. Simel, MD, MHS 


TITLE Predictive Power of Duplex Ultrasonography in 
Asymptomatic Carotid Disease. 

AUTHORS Lewis R, Abrahamowicz M, Core R, Battista 
RN. 

CITATION Ann Intern Med. 1997;127( 1): 13-20. 

QUESTION What is the prevalence of carotid stenosis in 
a large cohort of asymptomatic patients? 

DESIGN Prospective natural history study and random¬ 
ized trial of aspirin vs placebo, begun in 1988. 1 

SETTING Multicenter. 

PATIENTS General practitioners and specialists referred 
patients from community and teaching hospital settings for 
evaluation of carotid stenosis. Patients were excluded if they 
had cerebrovascular symptoms, valvular heart disease, recent 
myocardial infarction, and a variety of other conditions that 
would have affected outcomes in the randomized trial. Seven 
hundred fourteen patients were enrolled, with the focus of 
this review being only the baseline evaluation. 

The patient population showed a typical prevalence of 
patients with atherosclerotic risk: mean age, 65 years; hyper¬ 
tension, 47%; heart disease, 39%; hyperlipidemia, 50%; dia¬ 
betes, 20%; and current smokers, 35%. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A neurologist evaluated all patients to confirm a bruit. Ultra¬ 
sonography was performed, although obviously the radiolo¬ 
gist was aware of the presence of a bruit. 

MAIN OUTCOME MEASURE 

The predictive value of a carotid bruit for identifying various 
levels of carotid stenosis. 


MAIN RESULTS 

See Table 9-13. 


Table 9-13 Predictive Value of a Carotid Bruit for Identifying Various 
Levels of Carotid Stenosis 

Stenosis, No. 

(Degree of Stenosis, %) 

Positive Predictive Value 
of a Bruit (95% Cl) 

37 (100) 

5 (4-7) 

113(80-99) 

16(13-19) 

207 (50-79) 

29 (26-32) 

113(16-49) 

16(13-19) 

180 (1-15) 

25 (22-29) 

64 (Normal) 

9(7-11) 


Abbreviation: Cl, confidence interval. 


CONCLUSIONS 

LEVEL OF EVIDENCE Positive predictive value study. 

STRENGTHS A typical population of patients referred for 
ultrasonography. However, what makes this study unique 
is that all of the patients were asymptomatic for cere¬ 
brovascular disease. The ultrasonographers validated their 
proficiency. 

LIMITATIONS The results generalize only to populations 
with the same prevalence of carotid stenosis among patients 
with carotid bruits. No patient who lacked a carotid bruit 
was included, so the sensitivity and specificity cannot be 
determined. 

The study population and trial design are similar to those 
of an earlier study. 2 Furthermore, the patients in the 2 studies 
are similar in terms of their risk factors for atherosclerotic 
disease, which is important because the positive predictive 
value of a test depends on the prevalence of disease. The 2 
studies had almost identical positive predictive values for 
carotid stenosis (21% in this study for stenosis >80% vs 23% 
in the earlier study that used a cut point of 75%). 
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REFERENCES FOR THE EVIDENCE 

1. Asymptomatic Cervical Bruit Study Group. Natural history and effec¬ 
tiveness of aspirin in asymptomatic patients with cervical bruits. Arch 
Neurol. 1991;48(7):683-686. 

2. Chambers BR, Norris JW. Outcome in patients with asymptomatic neck 
bruits. N Engl J Med. 1986;315(14):860-865. 

Reviewed by David L. Simei, MD, MHS 


TITLE Carotid Artery Auscultation—Anachronism or 
Useful Screening Procedure? 

AUTHORS Magyar MT, Nam E, Csiba L, Ritter MA, 
Ringelstein EB, Droste DW. 

CITATION Neurol Res. 2002;24(7):705-708. 

QUESTION Among patients referred for carotid ultra¬ 
sonographic studies, does the presence of a bruit predict 
carotid stenosis of 70% to 99%? 

DESIGN Prospective, consecutive patients referred for 
ultrasonography. 

SETTING Single site. Inpatients and outpatients of a 
neurology department at a university hospital (Germany) 
who were referred for carotid ultrasonography. 

PATIENTS A total of 145 patients, of whom 43% had 
no history of cerebrovascular event (“asymptomatic”). 

The sample reflects a referred population of patients at risk 
for atherosclerotic vascular disease (hypertension, 43%; 
hyperlipidemia, 35%; smokers, 24%; angina, 19%; previous 
myocardial infarction, 18%; claudication, 12%; and diabetes, 
12%), although other patients were referred for lower-risk 
conditions (vertigo, dizziness, and psychosomatic symp¬ 
toms). A total of 273 carotid arteries were evaluated. 


Table 9-14 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 
70% a 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Bruit 0.56 0.91 6.0(3.2-10) 0.48(0.25-0.74) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Data are not broken out for symptomatic vs asymptomatic patients. 

MAIN RESULTS 

See Table 9-14. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

LIMITATIONS Relatively small sample size, referred popu¬ 
lation. 

STRENGTHS Includes a mixture of patients with and with¬ 
out cerebrovascular symptoms. Auscultation was done with¬ 
out knowledge of ultrasonographic results. 

The study enrolled consecutive referred patients and 
includes a population of patients with and without symp¬ 
toms. With patients at various risk levels of cerebrovascular 
disease, the results ought to overlap with other populations 
of asymptomatic patients and symptomatic patients—the 
confidence intervals for the likelihood ratios are similar to 
those of most other studies. 

Reviewed by David L. Simei, MD, MHS 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A single physician blinded to the patient’s medical history and 
the ultrasonographic results conducted the carotid auscultation. 
A different physician performed the carotid ultrasonography. 


MAIN OUTCOME MEASURE 

Carotid stenosis of 70% to 99%. 


E9-7 







CHAPTER 9 Evidence to Support the Update 


TITLE Can Simple Clinical Features Be Used to Identify 
Patients With Severe Carotid Stenosis on Doppler Ultra¬ 
sound? 

AUTHORS Mead GE, Warlaw JM, Lewis SC, McDowall 
M, Dennis MS. 

CITATION JNeurolNeurosurgPsychol. 1999;66( 1): 16-19. 

QUESTION Do carotid bruits, or a combination of clin¬ 
ical Endings, predict the presence of significant carotid 
stenosis in symptomatic patients? 

DESIGN Prospective. 

SETTING British hospital and neurovascular clinic. 

PATIENTS A total of 726 patients with an acute stroke, 
transient ischemic attack, or retinal stroke entered into the 
Lothian Stroke Registry. All patients had ultrasonography, 
independent of whether or not a bruit was detected. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

All patients were examined by a stroke physician or research 
registrar. Carotid Doppler ultrasonography was performed 
by one of 2 neuroradiologists who had excellent agreement 
with a subset of patients referred to angiography (k = 0.7- 
0.8). The ultrasonographers were blinded to the clinical data. 

MAIN OUTCOME MEASURES 

Stenosis of 70% to 99% by ultrasonography vs a nonsurgical 
stenosis (<70% or complete occlusion). Data were evaluated 
for univariate predictors and in a logistic model to assess for 
combinations of findings that might predict surgically cor¬ 
rectable carotid stenosis. 


MAIN RESULTS 

See Table 9-15. For the logistic model evaluating the combi¬ 
nation of findings, the presence of an ipsilateral bruit (odds 
ratio, 11; 95% confidence interval [Cl], 7.0-19) overwhelms 
other significant findings, making the likelihood ratio (LR) 
for 2 or more findings similar to that for an ipsilateral bruit 
alone. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Prospective design for a large number of 
symptomatic patients. All patients underwent ultrasonogra¬ 
phy and were included in the analysis, even for lower degrees 
of carotid stenosis. 


Table 9-15 Likelihood Ratios of Bruit for Carotid Stenosis of 
at Least 70% 


Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Ipsilateral bruit 

0.56 

0.90 

5.5 (4.1-7.2) 

0.48 (0.38-0.60) 

Peripheral 
vascular dis¬ 
ease 8 

0.28 

0.84 

17(1.2-2.4) 

0.86 (074-0.96) 

Diabetes 

0.16 

0.91 

17(1.0-2.8) 

0.93(0.83-1.0) 


Combination of Findings (Ipsilateral Bruit, Diabetes, 

Previous TIA, Not a Lacunar Event) 

>2 Findings 4.8 (3.5-6.5) 

0-1 Finding 0.57 

(0.46-0.69) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; TIA, transient ischemic attack. 

“Defined as absence of both foot pulses or femoral bruits, history of intermittent 
claudication, or a history of peripheral vascular surgery. 

LIMITATIONS The case definition for a “lacunar” event was 
not described and is important only when the results of the 
logistic model are applied. 

This is a high-quality study that appears to have less verifi¬ 
cation bias than the report on bruits from the North Ameri¬ 
can Symptomatic Carotid Endarterectomy Trial. 1 As 
expected, when there is less verification bias, the specificity is 
much better and accounts for the higher positive LR. In this 
study, which showed a 13% prevalence, a clinician would 
have to screen 3 patients to detect 1 with a bruit indicating a 
70% to 99% stenosis (number needed to screen, 95% Cl, 2-5 
patients). In probability terms, finding a carotid bruit 
increases the probability of a surgical carotid stenotic lesion 
from 13% to 46%. However, what if the patient has no bruit? 
The LR may not be good enough for most clinicians in that 
the probability of a 70% to 99% lesion only decreases from 
13% to 7%. These are the types of data that lead prudent 
physicians to infer that it is acceptable to listen to every 
symptomatic patient’s carotid arteries for bruits, but the 
results ought to be ignored if the patient is a suitable candi¬ 
date for surgery. In other words, clinicians should not use the 
absence of a bruit to "rule out" carotid stenosis in sympto¬ 
matic patients who would otherwise be amenable to endart¬ 
erectomy. 

REFERENCE FOR THE EVIDENCE 

1. Sauve JS, Thorpe KE, Sackett DL, et al. Can bruits distinguish high- 
grade from moderate symptomatic carotid stenosis? Ann Intern Med. 
1994;120(8):633-637. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Can Bruits Distinguish High-Grade From Moder¬ 
ate Symptomatic Carotid Stenosis? 

AUTHORS Sauve JS, Thorpe KE, Sackett DL, et al. 

CITATION Ann Intern Med. 1994;120(8):633-637. 

QUESTION Does the presence of a carotid bruit in a 
patient with a recent transient ischemic attack or nondis¬ 
abling stroke predict whether the patient will have high- 
grade carotid stenosis (70%-99%) vs a less significant ste¬ 
notic lesion (30%-69%)? 

DESIGN Analysis of data collected prospectively as part 
of the North American Symptomatic Carotid Endarterec¬ 
tomy Trial of carotid endarterectomy. 1 

SETTING Multicenter study at hospitals that qualified 
by proving their excellence in multidisciplinary care of 
patients with cerebrovascular disease. 

PATIENTS Patients had to have a qualifying cerebrovas¬ 
cular event and be appropriate candidates for carotid 
endarterectomy. The analytic set for this study includes 
1268 patients of 4526 patients screened. Of the 1268 
patients, 667 (53%) had a carotid stenosis of 70% to 99%. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The original data were collected prospectively at enrollment. 
The bruits were described as focal or diffuse and ipsilateral or 
contralateral. The examiners could have known the angio¬ 
gram results before their evaluation. The reference standard 
was applied to all patients included in the final analysis. All 
patients had duplex ultrasonography of the carotid arteries 
and had carotid arteriogram, performed by neuroradiolo¬ 
gists. The angiograms were reviewed by a data coordinating 
center. 

MAIN OUTCOME MEASURE 

Significant carotid stenosis (70%-99%) vs nonsurgical 
carotid stenosis (30%-69%). 

MAIN RESULTS 

See ible 9-16. The analysis focuses on focal ipsilateral bruits. 
Insufficient data were provided to assess the confidence intervals 
around diffuse ipsilateral bruits or contralateral bruits. A variety 
of risk factors collected during the history did not distinguish 
between high- and low-grade stenosis: hypertension, diabetes, 
hyperlipidemia, smoking, claudication, angina pectoris, myo¬ 
cardial infarction, heart failure, valvular heart disease, and atrial 
fibrillation, among others, were not useful. 


Table 9-16 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 70% 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Focal, ipsilat- 0.63 0.61 1.6 (1.4-1.8) 0.64(0.54-0.68) 

eral bruit 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 4 for answering the diagnostic 
accuracy questions. 

STRENGTHS The study had a large sample size from a well- 
designed, randomized, controlled clinical trial. 

LIMITATIONS Clinicians may have had access to the results 
of the ultrasonography or angiography. Verification bias 
exists in that patients without stenosis were excluded, so 
users of these data must understand the population before 
generalizing the results. 

The parent study from which these data were obtained 
was a well-designed clinical trial. However, the trial was 
not designed to assess the diagnostic power of carotid 
bruits. Nonetheless, it is appropriate to see what we can 
learn from such a rich data set. Understanding the study 
question is critical to understanding the results. The study 
exhibits verification bias for answering the diagnostic 
question of whether carotid bruits identify significant 
carotid stenosis in symptomatic patients. Verification bias 
typically, but not always, leads to overestimates of sensi¬ 
tivity (ie, a too optimistic negative likelihood ratio [LR-]) 
and underestimates specificity (ie, a too pessimistic posi¬ 
tive likelihood ratio [LR+]). The effect can be dramatic, 
such that if all patients were included (including those 
without any carotid stenosis), the LR+ most certainly 
would have been higher for the presence of a carotid bruit. 
However, it is certain that the absence of an ipsilateral 
bruit in a patient with a recent carotid artery distributed 
cerebrovascular event does little to rule out the presence 
of a significant carotid stenosis. 

This study is also a bit different from other studies in that 
the comparison group did not consist of all patients, but only 
those with moderate stenosis. By excluding patients with 
lesser degrees of stenosis, the presence of a bruit would lose 
some of its discriminatory power and both the LR+ and LR- 
would look worse in comparison. 

A second form of bias may also be in play in this study— 
expectation bias. Expectation bias occurs when the exam¬ 
iner has a preset belief about the presence of a finding. For 
example, if the examiner knows that the patient has a high- 
grade stenosis, the examiner may expect to hear a bruit (or 
vice versa). It is difficult to assess the effect of expectation 
bias on the LRs because they could make the values change 
in either direction. 
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REFERENCE FOR THE EVIDENCE 

1. North American Symptomatic Carotid Endarterectomy Trial (NASCET) 
Steering Committee. North American Symptomatic Carotid Endarterec¬ 
tomy Trial. Stroke. 1991;22(6):711-720. 

Reviewed by David L. Simel, MD, MHS 


TITLE The Prognostic Significance of Asymptomatic 
Carotid Bruits in the Elderly. 

AUTHORS Shorr RI, Johnson KC, Wan JY, et al. 

CITATION / Gen Intern Med. 1998;13(2):86-90. 

QUESTION Does the presence of a carotid bruit predict 
subsequent stroke in older patients with hypertension? 

DESIGN Prospective, observational study among 
patients enrolled in the Systolic Hypertension in the 
Elderly Program (SHEP). 1 The mean follow-up was 4.5 
years. 

SETTING Multicenter trial in the United States. 

PATIENTS Patients were aged 60 years or older, with 
isolated systolic hypertension, and formed part of a ran¬ 
domized clinical trial. The patients in this trial had no evi¬ 
dence of previous cerebrovascular disease, atrial 
fibrillation, insulin or warfarin use, coronary disease, or 
dementia. 

In the SHEP trial, 4736 patients were enrolled, with 294 
excluded from this analysis because they had a previous 
stroke, transient ischemic attack (TIA), or myocardial 
infarction. Thus, the analysis consists of 4442 patients, of 
whom 284 had an asymptomatic carotid bruit and 4158 
had no bruit. Of those patients with bruits, 44% (n = 124) 
had bilateral carotid bruits. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

All patients were evaluated before study entry for carotid 
bruits. An adjudication committee reviewed medical records 
for all persons who developed symptoms suggestive of a 
stroke or TIA. 

MAIN OUTCOME MEASURE 

Stroke or TIA. 

MAIN RESULTS 

See able 9-17. Patients with bruits were slightly older (mean 
age, 73 vs 71 years; P < .001) and less likely to be white (79% 
of patients with bruit were white vs 84% without bruit; P = 


Table 9-17 Predictive Value of a Carotid Bruit for Identifying Various 
Levels of Carotid Stenosis 

Test Outcome LR+ (95% Cl) LR- (95% Cl) 

Bruit Stroke during 1.5(0.95-2.2) 0.96(0.92-1.0) 

follow-up 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


.03). The patients with bruits were also more likely to smoke 
(18% vs 12%; P - .003) and had higher blood pressures 
(mean systolic, 173 vs 170 mm Hg; P < .001), higher choles¬ 
terol levels (mean, 244 vs 236 mg/dL; P - .006), and more 
frequent electrocardiogram abnormalities (67% vs 60%; P - 
. 01 ). 

A proportional hazards model showed that the risk of a 
stroke or TIA did not change over time. 

Although patients with bruits had a higher stroke rate than 
those without bruits, the difference is not significant. To 
explain part of the effect, patients with bruits were slightly 
more likely to be using aspirin (22% vs 16%), but they also 
were more likely to have been randomized to placebo for 
hypertension treatment (58% vs 50%). Adjusting for the 
hypertension treatment assignment vs placebo makes the 
carotid bruit effect slightly less, whereas adjusting for other 
risk factors extinguishes the effect of bruits even more (rela¬ 
tive risk [RR], 1.3). Even when creating 2 strata of patients 
(ie, low risk vs high risk according to the number of risk fac¬ 
tors present), the RR of stroke for those with carotid bruits is 
1.38 vs 1.36, respectively. The risk of bruits might be greater 
in the subset of patients 60 to 69 years of age (RR, 2.0; 95% 
confidence interval [Cl], 0.92-4.7), but the increased risk is 
less apparent for older patients (RR, 0.98; 95% Cl, 0.55-1.8). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Given the observed stroke rate in patients 
without bruits, the study had a post hoc power of 90% to 
find a difference of 5% strokes in patients without bruits vs 
10% for patients with bruits. This is a high power to rule out 
an important difference, although smaller differences could 
have gone undetected. The prevalence (6.4%) of bruits in 
these asymptomatic hypertensive patients is compatible with 
a general, community population. The study had clear case 
definitions. 

LIMITATIONS It is likely that the committee did have access 
to the medical records with information about the presence 
of a carotid bruit. However, given the rigorous case defini¬ 
tions, requirement for hospitalization for all patients with 
ischemic events, and adjudication by 3 neurologists, it seems 
unlikely that carotid bruits would have had a large effect on 
assessing the presence of a TIA or stroke. 

Among patients with a single risk factor for stroke (hyper¬ 
tension), followed as part of a clinical trial, it seems clear that 
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any importance of a carotid bruit in predicting stroke is 
small. As in the Fremantle Diabetes Study, 2 the effect of the 
bruit is likely related to its association with other risk factors 
for atherosclerosis. Both studies showed that patients with 
bruits were more likely to have important risk factors for ath¬ 
erosclerosis. 


REFERENCES FOR THE EVIDENCE 

1. Perry HM, Davis BR, Price TR, et al. Effect of treating isolated systolic 
hypertension on the risk of developing various types and subtypes of 
stroke: the Systolic Hypertension in the Elderly Program (SHEP). JAMA. 
2000;284(4):465-471. 

2. Gillett M, Davis WA, Jackson D, Bruce DG, Davis TME. Prospective 
evaluation of carotid bruit as a predictor of first stroke in type 2 diabetes: 
The Freemantle Diabetes Study. Stroke. 2003;34(9):2145-2151. 
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CHAPTER 


Does This Patient Have 

Carpal Tunnel 
Syndrome? 

Christopher A. D’Arcy, MD 
Steven McGee, MD 


CLINICAL SCENARIO 


A 55-year-old woman has difficulty sleeping because of 
numbness and tingling in her right hand for 6 months. 
On a hand diagram, she uses a pencil to locate precisely 
her numbness and tingling over the dorsal and palmar 
aspects of all 5 fingers, sparing the palm. On inspec¬ 
tion, the patient has no evidence of thenar atrophy, but 
thumb abduction is weak on the affected side. Sensory 
examination result using monofilaments and a vibrat¬ 
ing tuning fork is normal. Tinel sign is positive and 
Phalen sign is negative. Which of this patient’s symp¬ 
toms and signs are useful and which are useless for 
accurately predicting the diagnosis of carpal tunnel 
syndrome (CTS)? 


WHY IS THE DIAGNOSIS IMPORTANT? 


Carpal tunnel syndrome is an important cause of hand pain 
and functional impairment, attributable to compression of 
the median nerve at the wrist (Figure 10-1). Patients are 
usually between 30 and 50 years old, with women affected 3 
times as often as men. 2,3 About 0.5% of the general popula¬ 
tion reports being diagnosed with CTS. 2 It is likely, how¬ 
ever, that a minority of affected patients consult clinicians 
because population-based studies reveal that about 3% of 
adults have symptomatic electrodiagnostically confirmed 
CTS. 4 

In many patients, symptoms are self-limited or resolve 
with conservative measures such as splinting the wrist, using 
anti-inflammatory medication, and modifying their activi¬ 
ties. Corticosteroid injection into or near the carpal tunnel 
results in improvement in 49%-81% of those affected, 
although 50%-86% of those experience recurrence. 5 ' 9 In 
patients whose condition fails conservative treatment, surgi¬ 
cal division of the transverse carpal ligament promptly 
improves or relieves sensory complaints (dysesthesias) 75% 
to 99% of the time. 10 ' 18 Permanent complications from sur¬ 
gery occur in less than 1%, 19 but the subsequent recovery 
often requires leave from work, lasting days to several 
weeks. 18 

Many conditions, including pregnancy, rheumatoid 
arthritis, diabetes mellitus, and previous wrist trauma, are 
associated with CTS, 19 although histologic sections from the 
carpal tunnel of most affected patients are normal. 20,21 Many 
patients have an abnormally high tissue pressure within the 
carpal tunnel, 22 which presumably causes intraneural 
ischemia that leads to dysesthesias and abnormal results of 
sensory testing. 23 ' 25 

This article systematically reviews the diagnostic accuracy 
of bedside findings for CTS. Presentation of this informa¬ 
tion, however, first requires understanding some of the issues 
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Figure 10-1 Normal Anatomy of the Carpal Tunnel 

The carpal tunnel consists of the median nerve and 9 flexor tendons surrounded 
by the rigid carpal bones and transverse carpal ligament (flexor retinaculum). The 
distal wrist crease marks the proximal edge of the carpal tunnel. Within the tunnel, 
the median nerve divides into a motor branch that innervates the thenar muscles 
(opponens, abductor, and short flexor) and distal sensory branches that supply the 
thumb, index, and middle fingers and the radial half of the ring finger. Because the 
sensory branches to the radial palm do not usually pass through the carpal tunnel, 
palm sensation is preserved in a classic case of carpal tunnel syndrome. 1 


surrounding electrodiagnosis, the current CTS diagnostic 
standard. 


THE DIAGNOSTIC STANDARD FOR 
CARPAL TUNNEL SYNDROME 

In his original definition of CTS, Phalen 26 required patients to 
have 1 or more of 3 bedside findings: sensory changes 
restricted to the median nerve distribution of the hand (Table 
10-1), a positive Tinel sign, and a positive Phalen sign. Although 
electrodiagnosis was not part of Phalen’s definition, clinicians 
now use electrodiagnosis frequently to confirm the diagnosis, 
and some third-party payers require it before compensating 
claims. 34 Consensus committees from professional societies 
have endorsed electrodiagnosis as the diagnostic test of 
choice. 35,36 Diagnostic standards for nerve conduction studies 
in CTS have been developed, which report sensitivities of 49% 
to 84% and specificities of 95% to 99%. 37 

The sensitivity and specificity of electrodiagnosis in CTS 
requires explanation. For the sensitivity calculation, the cri¬ 
terion standard was bedside findings alone (eg, compatible 


Table 10-1 Definition of Abnormal Physical Findings 


Physical Finding 

Definition of Abnormal Finding 

Motor Examination 

Weak thumb 
abduction 

Weakness of resisted abduction, ie, movement of the 
thumb at right angles to the palm 3 

Thenar atrophy 

A concavity of the thenar muscles when observed from 
the side 

Sensory Examination 

Hypalgesia 

Diminished ability to perceive painful stimuli applied along 
the palmar aspect of the index finger when compared with 
the ipsilateral little finger 3 

Diminished 2- 
point discrimina¬ 
tion 

Diminished ability to identify correctly the number of points 
using calipers whose points are set 4-6 mm apart, compar¬ 
ing the index with little finger 3 

Abnormal vibra¬ 
tory sensation 

Diminished ability to perceive vibratory sensations using a 
standard vibrating tuning fork (128 of 256 Hz), comparing 
the distal interphalangeal joint of the index finger to the ipsi¬ 
lateral fifth finger 

Abnormal monofil¬ 
ament testing 

Using a Semmes-Weinstein monofilament applied to the 
pulp of the index finger, the patient’s threshold is greater 
than the 2.83 monofilament 

Other Tests 

Square wrist 
sign 27 

The anteroposterior dimension of the wrist divided by the 
mediolateral dimension equals a ratio of greater than 0.70, 
when measured with calipers at the distal wrist crease 

Closed fist sign 28 

Paresthesias in the distribution of the median nerve 
when the patient actively flexes the fingers into a 
closed fist for 60 s 

Flick sign 29 

When asking the patient, “What do you actually do with 
your hand(s) when the symptoms are at their worst?” the 
patient demonstrates a flicking movement of the wrist and 
hand, similar to that used in shaking down a thermometer 8 

Tinel sign 

Paresthesias in the distribution of the median nerve when 
the clinician taps on the distal wrist crease over the median 
nerve 

Phalen sign 

Paresthesias in the distribution of the median nerve when 
the patient flexes both wrists 90° for 60 s 

Pressure 
provocation test 30 

Paresthesias in the distribution of the median nerve when 
the examiner presses with his/her thumb on the palmar 
aspect of the patient's wrist at the level of the carpal tunnel 
for 60 s 

Tourniquet test 31 

Paresthesias in the distribution of the median nerve when a 
blood pressure cuff around the patient's arm is inflated 
above systolic pressure for 60 s 


“Most clinicians define weakness as muscle power less than that of the companion 
muscle in contralateral hand (which has the disadvantage of assuming that the 
opposite hand has normal strength) or that of a standard of normal strength based 
on the experience of examining many normal individuals (Figure 10-2). 

“Most clinicians use an open safety pin or broken applicator stick, which must be 
discarded after use to prevent transmission of infection. 

“The studies in this review separated the points of the calipers 4 mm, 32 5 mm, 33 and 
6 mm. 31 

"Any other response is a negative result. 

symptoms plus a positive Tinel sign), 38 ' 40 which then raises 
the question of whether electrodiagnosis or bedside findings 
are the more accurate standard. False-negative test results 
probably occur because the condition is intermittent 41 or 
because the patient’s symptoms emanate from small, unmy- 
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elinated fibers that are invisible to surface electrodes (elec¬ 
trodiagnosis detects only larger myelinated fibers). 42 

The high specificity figures in these studies are also mis¬ 
leading, being arbitrarily set at 2 SDs above the mean of 
observations of normal hands. The values of 95% to 99% are 
based on the assumption that nerve conduction recordings 
follow a standard gaussian distribution, which has been 
shown to be inaccurate. 43,44 False-positive test results are well 
documented when these test thresholds are applied to other 
populations. 10,45 ' 47 

It is well documented that many hand surgeons perform 
carpal tunnel release successfully in patients with normal 
electrodiagnostic findings. 15,34,48 ' 50 Even in patients with posi¬ 
tive electrodiagnostic findings who undergo surgery, symp¬ 
toms usually resolve within days despite nerve conduction 
abnormalities that persist for months or longer. 11,17,42,51,52 

Nonetheless, most physicians rely on electrodiagnosis as the 
best available diagnostic standard. Electrodiagnostic studies 
may help identify other conditions that also cause hand dyses¬ 
thesias, such as cervical radiculopathy, polyneuropathy, or 
other median nerve entrapment syndromes. 41,53 ' 55 Furthermore, 
the majority of patients in surgical studies have compatible 
symptoms and electrodiagnostic studies are positive for 
CTS. 10,12,17,56 Electrodiagnosis may not predict recovery after car¬ 
pal tunnel release, but neither does any other clinical variable 
with any certainty. The potential use of computed tomography 
(CT), magnetic resonance imaging (MRI), and ultrasonogra¬ 
phy is still being determined, and they remain primarily 
research tools. 57 ' 61 For these reasons, our review addresses the 
accuracy of the history and physical examination in diagnosing 
CTS, as confirmed by electrodiagnostic studies. 


METHODS 

Using the MEDLINE database for articles from January 
1966 to February 2000, both authors independently used 
the following search strategy, limited to the English lan¬ 
guage and human subjects, to retrieve all relevant publica¬ 
tions on the diagnosis of CTS in adults: “exp carpal tunnel 
syndrome” and “exp diagnosis.” In addition, text word 
searches were completed for “Tinel” or “Tinels” or “Hoff- 
man-Tinels,” and “Phalen” or “Phalens.” Based on review of 
titles and abstracts, relevant publications were retrieved. To 
complete the search, the authors reviewed the bibliogra¬ 
phies of these articles and retrieved all relevant articles. 

To be included in this review, a study had to satisfy the fol¬ 
lowing criteria: (1) the patients presented to a clinician for 
symptoms suggestive of CTS, (2) the physical examination 
maneuvers were clearly described, (3) there was an indepen¬ 
dent comparison with 1 or more electrodiagnostic parame¬ 
ters (which had to include at least some measurement of 
motor or sensory nerve conduction), and (4) the authors 
could extract from figures or tables in the articles the num¬ 
bers needed to construct 2x2 tables and calculate sensitivity, 
specificity, and likelihood ratios (LRs). 

Twelve articles met these criteria and are included. 27 ' 33,62 ' 66 
Thirty articles were excluded: 14 because the control group 


was asymptomatic, 67 ' 80 8 because the data were incom¬ 
plete, 15,49,57,81 ' 85 4 because the participants were identified by 
population surveys, 45,86 ' 88 3 because the criterion standard was 
unacceptable (ie, electromyography alone, 89 electrodiagnosis 
and abnormal monofilament testing, 90 or criterion standard 
missing 91 ), and 1 because the examination maneuvers were 
not clearly defined. 92 

Sensitivity, specificity, and LRs and their confidence inter¬ 
vals (CIs) were calculated using conventional definitions. 93 
When a cell of a 2 x 2 table was 0, 0.5 was added to all cells 
before summarizing the data for a particular test. Our sum¬ 
mary measures pooled all the data using the DerSimonian 
and Laird 94 random-effects model, which considers both 
within-study variance and variability among studies. Our 
test for homogeneity between studies was the effectiveness 
score, a test of overall accuracy. 95 

LRs are the odds that a given finding would occur in a 
patient with CTS as opposed to a patient without CTS. When 
a positive LR (LR+) or negative LR (LR-) has a value close to 
1, the result is unhelpful in clinical diagnosis. 

PRECISION AND ACCURACY 

How to Elicit Symptoms and Signs of Carpal 
Tunnel Syndrome 

Table 10-1 summarizes how to elicit the physical examina¬ 
tion signs of CTS analyzed in this review. When examining 
thumb strength, the clinician should focus on abduction of 
the thumb (Figure 10-2), not flexion or opposition, which 
sometimes can be accomplished by muscles innervated by 
nerves other than the recurrent motor branch of the 
median nerve. 54,59 The Katz hand diagram is a self-adminis¬ 
tered diagram that depicts both the dorsal and palmar 
aspect of the patient’s hands and arms (Figure 10-3). 
Patients use this diagram to mark the specific location of 
their symptoms, characterizing them as pain, numbness or 
tingling, or other. Diagrams are then graded as classic, 
probable, possible, or unlikely to be CTS on the basis of cri¬ 
teria that appear in Figure 10-3. 32,63 



Figure 10-2 Testing Thumb Abduction 

The patient is instructed to raise his or her thumb perpendicular to the palm 
as the examiner applies downward pressure on the distal phalanx. This 
maneuver reliably isolates the strength of the abductor poilicis brevis, which 
is innervated only by the median nerve. 



CHAPTER 10 The Rational Clinical Examination 



[a] Classic Pattern 


Symptoms affect at least 2 of digits 1,2, or 3. The classic 
pattern permits symptoms in the fourth and fifth digits, wrist pain, 
and radiation of pain proximal to the wrist, but it does not allow 
symptoms on the palm or dorsum of the hand. 




|~B~| Probable Pattern 

Same symptom pattern as classic, except palmar 
symptoms are allowed unless confined solely to the ulnar 
aspect. In the possible pattern, not shown, symptoms involve 
only 1 of digits 1,2, or 3. 




[c] Unlikely Pattern 

No symptoms are present in digits 1,2, or 3. 


| Numbness [j Pain ] Tingling v''/. Decreased Sensation 


Figure 10-3 Katz Hand Diagram 

Adapted with permission from Golding et al. 64 


Precision of the History and Physical Examination 
for Carpal Tunnel Syndrome 

Few studies have addressed the precision of findings for 
CTS. In one study, simple agreement was 84% for 2 physi¬ 
cians rating 54 of the Katz hand diagrams. 63 In another 
small study, the interobserver agreement was substantial 
for Tinel sign (k = 0.77) and Phalen sign (k = 0.65), mod¬ 
erate for vibration (k = 0.40), and fair for motor strength 
(k = 0.25). 96 The Tinel test, however, is probably much less 
precise than these data suggest because the proportion of 
healthy, asymptomatic hands with a positive Tinel sign 
ranges from 0% 28 to 45%. 71 Some of this variability with 
Tinel sign may relate to technique; in one study, a greater 
percussion force increased sensitivity at the expense of 
specificity. 89 

Diagnostic Accuracy of Physical Findings 

Table 10-2 summarizes the studies addressing the diagnos¬ 
tic accuracy of the history and physical examination for 
CTS. Based on the CIs of LRs, the following findings favor 
the electrodiagnosis of CTS when they are present in 
patients who present with hand dysesthesias: decreased 
sensitivity to pain (hypalgesia) in the median nerve terri¬ 
tory (LR, 3.1; 95% Cl, 2.0-5.1), classic or probable Katz 
hand diagram results (LR, 2.4; 95% Cl, 1.6-3.5), and weak 
thumb abduction strength (LR, 1.8; 95% Cl, 1.4-2.3). 
Using a slightly different system for grading hand dia¬ 
grams, another study also found that the definite or possi¬ 
ble hand diagram argued for CTS (LR, 2.1; 95% Cl, 1.5- 
3.0). 92 In our analysis, 2 findings argued against the elec¬ 
trodiagnosis of CTS: a Katz hand diagram classified as 
unlikely (LR, 0.2; 95% Cl, 0.0-0.7; not shown in Table 10- 
2) and normal thumb abduction strength (LR, 0.5; 95% 
Cl, 0.4-0.7). 

The following findings had limited or no value in distin¬ 
guishing patients with CTS from those without it: the 
patient’s age, presence of bilateral or nocturnal symptoms, 
thenar atrophy, other sensory abnormalities (2-point, vibra¬ 
tion, monofilament), Tinel sign, Phalen sign, pressure provo¬ 
cation test, and the tourniquet test. 

Several studies addressed the diagnostic accuracy of 
combined findings, 32,65,9 ° but no combination consistently 
proved significantly more helpful than the individual find¬ 
ings themselves. One study did find that a positive Tinel 
sign with a classic or probable hand diagram was slightly 
more discriminating (LR, 3.6; 95% Cl, 1.6-8.1) than either 
finding alone (LR, 1.8 for positive Tinel sign and 2.4 for 
classic or probable hand diagram), 32 although this result 
requires validation, given the problems with Tinel sign in 
other studies. 

According to our analysis, several unconventional find¬ 
ings—flick sign, closed fist sign, and square wrist sign— 
show promise in diagnosing CTS. However, these maneu¬ 
vers are not widely used and have been tested in only one or 
two studies. Two letters to editors suggest that the sensitiv¬ 
ity of the flick sign is much lower (only 25%-36%) than 
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Table 10-2 Diagnostic Accuracy of History and Physical Examination for Carpal Tunnel Syndrome 



Findings by Reference and Year 

No. of Hands 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Patient Interview 

Classic or Probable Hand Diagram 

Katzet al, 63 1990 

145 

0.64 

0.73 

2.4 (1.6-3.5) 

0.5 (0.3-07) 

Age > 40 y 

Katzet al, 32 1990 

110 a 

0.80 

0.41 

1.3 (1.0-1.7) 

0.5 (0.3-1.0) 

Nocturnal Paresthesia 

Buch-Jaeger and Foucher, 31 1994 

112 a 

0.51 

0.68 

1.6 (1.0-2.6) 

07(0.5-1.0) 

Gupta and Benstead, 62 1997 

92 

0.84 

0.33 

1.2 (1.0-1.6) 

0.5 (0.2-1.1) 

Katzet al, 32 1990 

110 

0.77 

0.27 

1.1 (0.9-1.3) 

0.8 (0.4-1.6) 

Pooled results 

b 



1.2 (1.0-1.4) 

0.7 (0.5-0.9) 

Bilateral Symptoms 

Katzet al, 32 1990 

110 a 

0.61 

0.58 

1.4 (1.0-2.1) 

07(0.4-1.0) 

Motor Examination 

Weak Thumb Abduction 

Gerr et al, 33 1995 

115 

0.63 

0.62 

17(1.1-2.4) 

0.6 (0.4-0.9) 

Kuhlman and Hennessey, 30 1997 

228 

0.66 

0.66 

2.0(1.4-27) 

0.5 (0.4-07) 

Pooled results 




1.8(1.4-2.3) 

0.5 (0.4-07) 

Thenar Atrophy 

Gerr etal, 33 1995 

115 

0.28 

0.82 

1.6 (0.8-3.2) 

0.9(07-1.1) 

Golding et al, 64 1986 

110 

0.04 

0.99 

5.4(0.2-130) 

1.0 (0.9-1.0) 

Katzet al, 32 1990 

110 a 

0.14 

0.90 

1.5 (0.5-4.1) 

0.9 (0.8-1.1) 

Pooled results 




1.6 (0.9-2.8) 

1.0 (0.9-1.0) 

Sensory Examination 

Hypalgesia 

Golding etal, 64 1986 

110 

0.15 

0.93 

2.2 (0.7-67) 

0.9 (0.8-1.1) 

Kuhlman and Hennessey, 30 1997 

228 

0.51 

0.85 

3.4 (2.0-5.8) 

0.6 (0.5-07) 

Pooled results 




3.1 (2.0-5.1) 

0.7 (0.5-1.1) 

2-Point Discrimination 

Buch-Jaeger and Foucher, 31 1994,6 mm 

167 

0.06 

0.99 

4.5 (0.6-37) 

1.0 (0.9-1.0) 

Gerr etal, 33 1995, 5 mm 

115 

0.28 

0.64 

0.8 (0.5-1.3) 

1.1 (0.9-1.5) 

Katz et al, 32 1990, 4 mm 

110 a 

0.32 

0.80 

1.6 (0.8-3.1) 

0.8(07-1.1) 

Pooled results 




1.3(0.6-27) 

1.0 (0.9-1.1) 

Abnormal Vibration 

Buch-Jaeger and Foucher, 31 1994 

172 

0.20 

0.81 

1.1 (0.6-2.0) 

1.0 (0.8-1.1) 

Gerr etal, 33 1995 

115 

0.61 

0.71 

2.1 (1.3-3.3) 

0.5 (0.4-0.8) 

Pooled results 




1.6 (0.8-3.0) 

0.8 (0.4-1.3) 

Abnormal Monofilament Findings 

Buch-Jaeger and Foucher, 31 1994 

167 

0.59 

0.59 

1.5 (1.1-2.0) 

0.7 (0.5-0.9) 

Other Tests 

Square Wrist Sign 

Kuhlman and Hennessey, 30 1997 

228 

0.69 

0.73 

2.6 (1.8-3.7) 

0.4 (0.3-0.6) 

Radecki, 27 1994 

665 

0.47 

0.83 

2.8 (2.1-3.8) 

0.6 (0.6-07) 

Pooled results 




2.7 (2.2-3.4) 

0.5 (0.4-0.8) 

Closed Fist Sign 

DeSmetet al, 28 1995 

35 

0.61 

0.92 

7.3 (1.1-49) 

0.4 (0.2-07) 

Flick Sign 

Pryse-Phillips, 29 1984 

396 

0.93 

0.96 

21 (11-42) 

0.1 (0-0.1) 


(Continued) 
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Table 10-2 Diagnostic Accuracy of History and Physical Examination for Carpal Tunnel Syndrome ( Continued) 


Findings by Reference and Year 

No. of Hands 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Other Tests 

Tinel Sign 

Gerr et al, 33 1995 

115 

0.25 

0.67 

0.7 (0.4-1.3) 

1.1 (0.9-1.4) 

Golding et al, 64 1986 

110 

0.26 

0.80 

1.3 (0.6-2.6) 

0.9(07-1.2) 

Heller et al, 65 1986 

80 

0.60 

0.77 

2.7(1.2-5.9) 

0.5 (0.3-0.8) 

Katzet al, 32 1990 

110“ 

0.59 

0.67 

1.8 (1.2-2.7) 

0.6 (0.4-0.9) 

Kuhlman and Hennessey, 30 1997 

228 

0.23 

0.87 

1.8(1.0-3.4) 

0.9 (0.8-1.0) 

Buch-Jaeger and Foucher, 31 1994 

172 

0.42 

0.64 

1.1 (0.8-1.7) 

0.9(07-1.2) 

Pooled results 




1.4 (1.0-1.9) 

0.8 (07-1.0) 

Phalen Sign 

Buch-Jaeger and Foucher, 31 1994 

166 

0.58 

0.54 

1.3(0.9-17) 

0.8 (0.6-1.1) 

Gerr et al, 33 1995 

115 

0.75 

0.33 

1.1 (0.9-1.4) 

0.7 (0.4-1.3) 

Heller et al, 65 1986 

80 

0.67 

0.59 

1.6(1.0-2.8) 

0.6 (0.3-0.9) 

Katzet al, 32 1990 

110“ 

0.75 

0.47 

1.4 (1.1-1.9) 

0.5 (0.3-0.9) 

Kuhlman and Hennessey, 30 1997 

228 

0.51 

0.76 

2.1 (1.4-3.2) 

0.6 (0.5-0.8) 

Golding et al, 64 1 986 

110 

0.10 

0.86 

0.7 (0.2-2.2) 

1.0 (0.9-1.2) 

Burke et al, 66 1999 

200 

0.51 

0.54 

1.1 (0.7-1.8) 

0.9 (0.6-1.3) 

De Smet et al, 28 1995 

66 

0.91 

0.33 

1.4 (0.9-2.0) 

0.3 (0.1-0.9) 

Pooled results 




1.3 (1.1-1.6) 

0.7 (0.6-0.9) 

Pressure Provocation Test 

Kuhlman and Hennessey, 30 1997 

228 

0.28 

0.74 

1.1 (0.7-1.7) 

1.0 (0.8-1.1) 

Burke et al, 66 1999 

205 

0.52 

0.38 

0.8 (0.6-1.2) 

1.3(07-2.2) 

Buch-Jaeger and Foucher, 31 1994 

155 

0.49 

0.54 

1.1 (0.8-1.5) 

0.9(07-1.3) 

DeSmetet al, 28 1995 

66 

0.63 

0.33 

0.9 (0.6-1.5) 

1.1 (0.5-27) 

Pooled results 




1.0 (0.8-1.3) 

1.0 (0.9-1.1) 

Tourniquet Test 

Buch-Jaeger and Foucher, 31 1994 

145 

0.52 

0.36 

0.8 (0.6-1.1) 

1.3 (0.9-2.0) 

Golding et al, 64 1 986 

110 

0.21 

0.87 

1.6(07-3.9) 

0.9 (0.8-1.1) 

Pooled results 




1.0 (0.5-1.9) 

1.0 (07-1.5) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


A positive LR (LR+) indicates a positive finding for carpal tunnel syndrome; a negative LR (LR-) indicates either a negative finding or an absent finding. 
“Refers to individual subjects instead of individual hands. 

“Ellipses indicate not applicable. 


that indicated in Table 10-2. 84,85 Therefore, before any of 
these 3 findings can be recommended, further supportive 
evidence is necessary. 

There are several reasons why some findings are not as 
helpful diagnostically as traditionally thought. Thenar 
atrophy is probably not useful because it occurs only in 
long-standing or neglected cases of CTS and can also 
result from lower cervical radiculopathies or polyneurop¬ 
athies. Tinel described his sign for following the course of 
regenerating nerve in patients after blunt traumatic nerve 
injury. 30,76,87 The idea that patients with CTS would also 
have a stub of continually regenerating nerve at the distal 
wrist crease seems unlikely, limiting the diagnostic utility 
of this particular test. Our analysis shows that hypalgesia 
in the median nerve distribution is a more useful diag¬ 
nostic finding than are abnormalities of other sensory 
modalities, in part because hypalgesia is a more specific 


finding. It is not clear why this should be, although it may 
indicate that the threshold for abnormal results when 
testing sensation for vibration, 2-point discrimination, 
and monofilaments is set too low (eg, in one study, 20% 
of asymptomatic hands also displayed abnormal monofil¬ 
ament results 76 ). 

In our analysis, only results for the Tinel sign were het¬ 
erogeneous. The heterogeneity is not explained by differ¬ 
ences in the electrodiagnostic parameters used as criterion 
standards in the individual studies, variations in examina¬ 
tion technique (ie, whether the clinician tapped over the 
median nerve using the index finger or a reflex hammer), 
differences in prevalence of CTS in each of the studies 
(mean prevalence was 57%), differences in the age and sex 
composition (mean age was 50 years; 77% were women), 
or by an apparent workup bias. Excluding the 2 studies 
that account for the heterogeneity 62,64 does not change the 
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summary measure in any meaningful way, and therefore, 
these studies are included in our analysis. 

THE BOTTOM LINE 

When evaluating patients with hand dysesthesias, the find¬ 
ings most helpful in predicting the electrodiagnosis of CTS 
are hand symptom diagrams, hypalgesia, and weak thumb 
abduction strength testing. The square wrist sign, flick sign, 
and closed fist sign also show promise but require validation 
by other investigators. Many traditional findings, including 
Phalen and Tinel signs, have limited ability to predict the 
electrodiagnosis of CTS. 

The main limitation of the existing literature is the lack of 
an ideal criterion standard, which complicates all clinical 
research in the field of CTS. It is also important that these 
data are derived from symptomatic patients presenting to a 
surgeon, physical therapist, or an electro diagnostic labora¬ 
tory. There are no data addressing the value of physical diag¬ 
nosis in patients presenting to a primary care physician with 
symptoms suggestive of CTS. Our analysis, therefore, is most 
applicable to patients with severe enough symptoms to war¬ 
rant such a referral. 

Returning to the case presented at the beginning of the 
article, the findings of a classic hand diagram and thumb 
abduction weakness support the diagnosis of CTS. The find¬ 
ings of a normal thenar eminence, a positive Tinel sign, and a 
negative Phalen sign do not contribute significant diagnostic 
information. The patient’s clinician believed that she proba¬ 
bly had CTS and chose to manage her symptoms by splinting 
her wrists and recommending anti-inflammatory medica¬ 
tions. If the patient’s symptoms fail to improve, nerve con¬ 
duction testing, additional empiric therapeutic modalities 
(eg, corticosteroid injections), or referral for surgical assess¬ 
ment should be considered. 
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UPDATE: Carpal Tunnel 



Prepared by David L. Simel, MD, MHS 
Reviewed by Richard Bedlack, MD 


CLINICAL SCENARIO 


Your 50-year-old secretary complains to you that she can¬ 
not complete your clinic notes on the computer without her 
hands tingling, especially her thumb and second and third 
fingers. Her symptoms are there even when she is not typ¬ 
ing. In fact, she says that she has more problems at home 
because discomfort in her hands awakens her at night. She 
has had diabetes for 6 years. You purchase a variety of office 
supply products that might help her type and then wait to 
see whether her symptoms resolve. 

A week later, she still has problems, although a cushion 
she ordered for her wrists has not arrived. You check for 
Tinel sign (which she has), and when you flex her wrists, it 
reproduces her symptoms. You suggest that she consult 
her primary care physician, and she asks you what to 
expect. You suggest that her physician assess her diabetes 
to see whether she have might have a neuropathy, check 
neck radiographs to ensure there is no evidence of cervical 
degenerative changes, and review thyroid function tests, 
nerve conduction tests, and a magnetic resonance image 
(MRI) of the wrists. Have you requested all the necessary 
tests, or did you suggest too many? 

UPDATED SUMMARY ON CARPAL TUNNEL SYNDROME 

Original Review 

D’Arcy C, McGee S. Does this patient have carpal tunnel syn¬ 
drome? JAMA. 2000;283(23):3110-3117. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search for The Ratio¬ 
nal Clinical Examination series, which combined the sub¬ 
ject heading carpal tunnel syndrome (CTS) with meta¬ 
analysis or receiver operating characteristic curve. The 
results were crossed with the text words “Phalen,” “Tinel,” 
“square wrist,” “thumb abduction,” “hypalgesia,” “closed 
fist,” “flick,” or “hand diagram” appearing in studies pub¬ 
lished in English from 1999 to 2004. The results yielded 141 
titles and abstracts for review. As in the original Rational 


Clinical Examination article, we were interested only in 
studies that assessed clinical findings in a population of 
patients with hand symptoms, that were an independent 
comparison with electrodiagnosis, and from which we 
could extract the data. The abstracts were reviewed to iden¬ 
tify studies that might allow us to assess the sensitivity and 
specificity either of the findings judged helpful in the origi¬ 
nal review (eg, hand symptom diagram, hypalgesia, and 
thumb abduction strength testing) or for less commonly 
used maneuvers that required additional data (eg, square- 
wrist sign, flick sign, and closed-fist sign). We found 12 
original articles for further review. A review of the reference 
lists identified 6 other articles that were obtained. For origi¬ 
nal articles, we retained those that studied at least 100 
hands. 

We excluded articles that used normal persons without 
symptoms as a control population or that were retrospec¬ 
tive studies, which is necessary because the usefulness of 
tests can be overstated when a population of patients for 
whom CTS would not be considered is included. 1 Including 
“normal” control patients tends to overstate the specificity 
and makes it appear that a finding helps identify those with 
the disorder. For example, a Phalen sign has a positive likeli¬ 
hood ratio (LR+) of 2.9 when normal, asymptomatic 
patients are included. However, when only symptomatic 
patients for whom CTS would be considered are studied, 
the finding appeared useless in the same study, with an LR+ 
ofO.91. 1 

No systematic review of the clinical examination findings 
used the inclusion criteria we required. A systematic review 
of surgery for CTS evaluated the role of electrodiagnostic 
testing as a suitable reference standard for predicting a suc¬ 
cessful outcome. 


NEW FINDINGS 

• People flick their hands when they have hand symptoms, 
whether or not they have CTS. 2 

• Clinical maneuvers designed to induce or exacerbate the 
patient’s symptoms cause them discomfort, but do nothing 
to alter the likelihood for or against CTS. 3-5 

• Additional evidence confirms the uselessness of Tinel or 
Phalen signs. 2,6 
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• Combining symptoms 2 and signs 7 does not appear to 
improve accuracy. 

• Clinicians should focus further diagnostic efforts on 
patients with symptoms in the median nerve distribution. 
These symptomatic patients are the only patients who will 
meet the reference standard criteria of combined hand dia¬ 
gram results and electrodiagnosis. 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

Additional data confirm the lack of utility for Tinel or Phalen 
signs and provocation tests. New summary estimates are pro¬ 
vided for these findings. No studies were found that were 
missed in the original publication. 

New data help us come up with prior probability estimates 
for CTS. When screened by a questionnaire, about 10% of 
patients in the community claim numbness or tingling in the 
radial fingers (median nerve distribution) in at least 1 of their 
hands. 8,9 About 70% of patients with numbness or tingling in a 
median nerve distribution will complete hand diagrams that 
suggest “classic or probable” CTS. 2 Thus, among all adults, the 
prior probability of hand symptoms compatible with CTS is 
7% (ie, 0.10 x 0.70). Because the diagnosis of CTS is consid¬ 
ered only when the patient has hand symptoms, we can use the 
value of 7% as a starting point for our prior probability of 
CTS. This makes sense because the classic/probable distribu¬ 
tion on the hand diagram is part of our pragmatic reference 
standard for CTS. These estimates from a population sample 
are supported by a large clinical sample of patients referred for 
electrodiagnosis; among 8223 electrodiagnostic studies in 
patients evaluated for CTS, 7 the distribution of positive elec¬ 
trodiagnostic studies is the following: 

• First, second, and third finger symptoms: 26% positive 

• All fingers (1-5): 17% 

CHANGES IN THE REFERENCE STANDARD 

The original publication in The Rational Clinical Examina¬ 
tion series focused on patients with CTS symptoms who had 
their disease status confirmed by electrical studies. A letter to 
the editor highlighted the dilemma in making this diagnosis, 


Table 10-3 Carpal Tunnel Syndrome (CTS) Using the Paired Hand 
Diagram and Electrodiagnostic Results as the Reference Standard 

Symptom 

Electrodiagnosis 

Ordinal Rank in Terms 
of Likelihood of CTS 

Classic/probable 

Abnormal 

1 (Most likely) 

Possible 

Abnormal 

2 

Classic/probable 

Negative 

3 

Possible 

Negative 

4 

Unlikely 

Abnormal 

5 

Unlikely 

Negative 

6 (Least likely) 


with the author’s suggestion that we should have titled the 
article “Does This Patient Have Abnormal Median Conduc¬ 
tion?” 10 Some researchers have advocated MRI to identify 
affected patients. A systematic review of MRI revealed that 
much-higher-quality evidence must be generated before 
MRI can be accepted as a screening test, but it seems unlikely 
that it will ever suffice as a reference standard. 11 The use of 
electrodiagnosis for CTS is not perfect. The explanations for 
the fallibility of electrodiagnosis as “the” reference standard 
are as follows: some patients have clinically significant nerve 
compression with normal electro diagnosis study results, the 
use of population means and standard deviations to define 
normality ensures that 2.5% of the population will have CTS 
(ie, the area beyond 2 SDs of 1 tail in the normal distribution 
curve for median nerve conduction velocity), and studies use 
various cut points for normality on median nerve testing. 12 

A group of experts in carpal tunnel epidemiology, clini¬ 
cal care, and outcome assessment used a nominal group 
process method to develop case definitions suitable for 
epidemiologic research. 13 Although the authors state that 
their criteria were not meant for actual clinical practice, 
we used these criteria in the original review in The Ratio¬ 
nal Clinical Examination article, and they reflect the com¬ 
bination of symptoms and electrodiagnosis that most 
clinicians use to establish the diagnosis ( ble 10- ). The 
symptoms refer to the Katz hand diagram as shown in 
Figure 10-3 of the original Rational Clinical Examination 
article. 

A systematic review by a panel of neurology experts 
identified 497 articles published from 1990 to 2000 on CTS 
diagnosis. 14 According to formal criteria that included 
(among others) prospective study design and that all 
patients must have had a clinical diagnosis of CTS per¬ 
formed independently of electrodiagnosis, they retained 25 
articles for review. Their meta-analysis found a pooled 
sensitivity of 0.85 and a specificity of 0.98 for sensory or 
mixed median nerve conduction to confirm the clinical 
diagnosis. At face value, this seems reassuring. However, 
the group noted the problems with selection bias and 
observer bias in extant studies of CTS and electrodiagno¬ 
sis. They proposed clinical diagnostic criteria for future 
CTS research that give important insight into the symp¬ 
toms that primary care providers should evaluate. As in the 
Rempel et al 13 report, no particular physical examination 
findings are required to establish the clinical diagnosis 14 
( ble 10- ). The combination of clinical diagnosis and 
electrodiagnosis serves as both a suitable epidemiologic 
standard and pragmatic clinical reference standard for pri¬ 
mary care clinicians. However, it is clear that some patients 
with classic symptoms but normal electrodiagnosis can 
improve with treatment of CTS. 

Jordan et al 15 performed a systematic review of surgical 
therapy for CTS, specifically for assessing whether the 
results of electrodiagnostic testing predicted treatment 
response. They found the results not only of generally poor 
quality but also showing no differences in surgical out¬ 
comes for patients with symptoms and positive electrodiag- 
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nosis vs symptoms and normal electrodiagnostic study 
results. Of the 4 studies they included with relative risk 
data, the confidence interval included 1 for the relative risk, 
favoring good outcomes for those with a positive electrodi¬ 
agnosis vs those with a normal electrodiagnosis. Three 
recent Cochrane reviews of CTS treatment found a few 
studies that did not require electrodiagnosis, but none 
included an analysis of whether patients diagnosed with 
symptoms alone had a different response compared with 
those with symptoms plus abnormal electrodiagnostic test 
results. 16 ' 18 

RESULTS OF LITERATURE REVIEW 

See Table 10-5. 

EVIDENCE FROM GUIDELINES 

No US or Canadian guidelines exist for routine screening for 
CTS. 


CLINICAL SCENARIO—RESOLUTION 


The diagnosis of CTS seems reasonably certain, given that 
your secretary has the appropriate symptoms in the 
appropriate distribution (median nerve). You did not 
need to do the Tinel sign or make her fingers tingle with a 
provocation test. The suggestion that she be evaluated for 
diabetic neuropathy is important. Neck radiographs do 
not seem indicated unless there are some other symptoms 
to suggest a cervical problem. A systematic review of rou¬ 
tine testing for diabetes, thyroid disease, or rheumatoid 
arthritis in patients with CTS showed that this practice 
infrequently picks up new diagnoses and is not neces¬ 
sary. 20 An electrodiagnostic test result, if positive, would 
mean that she meets the research criteria for CTS. MRI 
does not have an established role in diagnostic assessment 
for CTS. 

The remaining question is, should you have suggested 
a nerve conduction study? A nerve conduction study 
might be indicated as part of an assessment for a sys¬ 
temic neuropathy. Her carpal tunnel symptoms, together 
with a positive electrodiagnostic test, would fulfill the 
accepted reference standard for research studies. How¬ 
ever, some patients with positive symptoms have normal 
nerve conduction study results. It might be appropriate 
to wait and see whether she responds to simple ergo¬ 
nomic measures, wrist splinting, and, perhaps, steroid 
injection for short-term relief before considering the 
nerve conduction test. 


Table 10-4 Carpal Tunnel Syndrome (CTS) Diagnosis Using the 
Paired-Hand Diagram, Additional Symptoms, and Electrodiagnostic 
Results as the Reference Standard 

Inclusion Criteria for CTS for Research Studies on Electrodiagnosis 

1. Symptom distribution as noted above (but the fourth finger is also allowed) 

2. Symptoms must be present for 1 month, and there must be periods 
when the symptoms are intermittent 

3. Symptoms must be aggravated by sleep, sustained hand or arm posi¬ 
tioning, or repetitive motion of the hand 

4. Symptoms must be relieved by change in hand position, shaking the 
hand, or use of a wrist splint 

5. When pain is present, the pain in the wrist, hand, or finger must be 
worse than any pain in the elbow, shoulder, or neck 

Exclusion Criteria for CTS for Research Studies on Electrodiagnosis 

1. Symptoms primarily in the fifth finger 

2. Neck or shoulder pain preceding digital paresthesias 

3. Numbness or paresthesias in the feet that preceded hand symptoms 

4. Another disorder that explains symptoms that is more likely than CTS 


Table 10-5 Likelihood Ratios for a Variety of Signs and Combinations of 
Findings for Carpal Tunnel Syndrome 

Finding (n = No. of 


Combined Studies) 

Sensitivity Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Tinel (n = 8) a 



1.5(12-2.1) 

0.82 (0.72-0.93) 

Phalen (n = 10) a 



1.3(1.2-15) 

0.74 (0.62-0.87) 

Provocation tests 
(n = 8) a 



1.1 (0.96-1.3) 

0.89(0.79-1.0) 

Multivariate model 
with 11 clinical 
variables (n = 1) b 

0.79 

0.54 

17(16-18) 

0.39 (0.35-0.43) 

Flick or Tinel 
(n = I) 2 

0.46 

0.68 

1.5(0.94-2.4) 

0.79(0.60-1.0) 

Phalen or Tinel 
(n = I) 2 

0.41 

0.72 

1.5(0.89-2.5) 

0.81 (0.63-1.0) 

Flick (n = I) 2 

0.37 

0.74 

1.4(0.80-2.4) 

0.85(0.68-1.1) 

Flick or Phalen 
(n = 1) 2 

0.49 

0.62 

1.3(0.86-2.0) 

0.82(0.61-1.1) 

Abnormal monofila¬ 
ment in digits 1,2, 
or 3 (n = 1) c 

0.98 

0.15 

12(10-13) 

0.11 (0.02-0.64) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likeli¬ 
hood ratio. 

“Updated summary adds data from Hansen et al 2 and O'Gradaigh and Merry 6 to data 
from the original Rational Clinical Examination article. 
b A multivariate model 7 using 4 symptoms (nocturnal symptoms, morning symptoms, 
worsens on driving, and relieved by “waking and shaking”), symptom distribution, side of 
worst symptoms, handedness, duration of symptoms, response to splinting, and patient 
age was studied with a large “training" set and “test” set. The model had an accuracy of 
only 66% (area under the receiver operating characteristic curve). 

“The sensitivity from this study 19 requires confirmation in additional studies. 



















































CHAPTER 10 Update 


CARPAL TUNNEL SYNDROME— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Among all adults, the prior probability of hand symptoms 
compatible with CTS is 7%. See Table 10-6 for the likelihood 
ratios for Tinel and Phalan signs. 


Table 10-6 Likelihood Ratios for Tinel and Phalen Signs 

Finding LR 

The presence of Tinel or Phalen signs in a patient with symptoms ~1 

The absence of Tinel or Phalen signs in a patient with symptoms =1 

Abbreviation: LR, likelihood ratio. 

POPULATION FOR WHOM CARPAL TUNNEL 
SYNDROME SHOULD BE CONSIDERED 

• Patients with tingling or numbness in the hands or 
arms—always assess for median nerve involvement. 

• Special populations include those with occupational 
exposure of repetitive motion or pregnancy in the third 
trimester. 

• The rates of CTS might be slightly higher in those with 
diabetes mellitus, rheumatoid arthritis, or hypothyroid¬ 
ism. However, the data are not convincing, and routine 
screening for these diseases will infrequently lead to new 
diagnoses. 


DETECTING THE LIKELIHOOD OF CARPAL 
TUNNEL SYNDROME 

The examination should focus on the distribution of symp¬ 
toms in a hand diagram, rather than provocative maneuvers 
to elicit symptoms. 

REFERENCE STANDARD TESTS 

The distribution of hand symptoms (from a hand diagram) 
plus abnormal nerve conduction studies is the reference 
standard for epidemiologic studies. 

For clinical care, patients can have CTS despite a normal 
nerve conduction result. Data are inconclusive about 
whether treatment outcomes differ according to the nerve 
conduction results. 
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EVIDENCE TO SUPPORT THE UPDATE 


Carpal Tunnel 



TITLE The Value of the History in the Diagnosis of 
Carpal Tunnel Syndrome. 

AUTHOR Bland JDP. 

CITATION /Hand Surgery [Br], 2000;25(5):445-450. 

QUESTION Do any patient symptoms predict abnor¬ 
mality on electrodiagnostic studies? 

DESIGN Data collected prospectively during an 8-year 
period. 

SETTING Single center in the United Kingdom that per¬ 
forms all the electrodiagnostic studies for the local area. 

PATIENTS Referred (n = 8223) for electrodiagnosis 
among a broad array of patients being considered for car¬ 
pal tunnel surgery or for diagnostic evaluation. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A questionnaire was given to all patients before the electrodi¬ 
agnostic study. A single examiner performed all studies. It is 
likely that the examiner reviewed the questionnaire before 
the nerve conduction studies. See Die 10-7. 


Table 10-7 Likelihood Ratios of the Tinel, Flick, and Phalen Signs for 
Carpal Tunnel Syndrome 

Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Tinel 

0.27 

0.91 

3.2 (1.2-8.6) 

0.79 (0.68-0.92) 

Flick 

0.37 

0.74 

1 .4 (0.80-2.4) 

0.85(0.68-1.1) 

Phalen 

0.34 

0.74 

1.3(0.74-2.3) 

0.89(0.71-1.1) 

Phalen or 

Tinel 

0.41 

0.72 

1.5(0.89-2.5) 

0.81 (0.63-1.0) 

Flick or Tinel 

0.46 

0.68 

1.5 (0.94-2.4) 

0.79(0.60-1.0) 

Flick or Phalen 

0.49 

0.62 

1.3 (0.86-2.0) 

0.82(0.61-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


MAIN OUTCOME MEASURE 

A multivariate model using electrodiagnosis as the reference 
standard. 

MAIN RESULTS 

The data were split into a training set (n = 5000) and a test set 
(n = 3223). A logistic model for patient symptoms was cre¬ 
ated using the data for 5000 patients. The model contained 4 
symptoms (nocturnal symptoms, morning symptoms, worse 
on driving, and relieved by “waking and shaking”), symptom 
distribution, side of worst symptoms, handedness, duration 
of symptoms, response to splinting, and patient age as con¬ 
tinuous variables. 

The only variables with an odds ratio (OR) greater than 2 
were the presence of symptoms in the thumb and the second 
and third fingers (OR, 2.5; 95% confidence interval [Cl], 2.1- 
3.0) or symptoms in the third and fourth fingers (OR, 2.4; 
95% Cl, 1.9-3.1). The only variable that had an OR less than 
0.5 was the presence of symptoms in the fourth and fifth fin¬ 
gers (OR, 0.42; 95% Cl, 0.29-0.62). As a continuous variable, 
age also had an important impact on the probability of carpal 
tunnel syndrome (CTS). For example, with a typical symp¬ 
tom pattern, without regard to any other symptom, a right- 
handed patient with right-handed symptoms has a predicted 
probability of 29% at age 30 years vs 66% at age 50 years. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Very large patient population that captured all 
patients referred for electrodiagnostic studies. It is likely that 
these patients reflect the array of patients who are referred in 
other community studies for the evaluation of CTS. 

LIMITATIONS The examiner would have known the results 
of the questionnaire (although the examiner would not have 
known the variables that would ultimately go in the logistic 
model). 

The results of the logistic model would be difficult to apply 
in general practice. However, understanding the role of the dis- 
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tribution of symptoms in the digits is important and is integral 
to the current accepted reference standard of hand diagrams 
plus electrodiagnosis. Unfortunately, despite including 11 
seemingly relevant clinical variables, the multivariate logistic 
model had a sensitivity of only 79% and a specificity of 54%. 
The area under the receiver operating characteristic (ROC) 
curve was only 0.66 (standard error of 0.01), reflecting an 
accuracy that seems too low for clinical use. 

Reviewed by David L. Simel, MD, MHS 


TITLE Clinical Utility of the Flick Maneuver in Diagnos¬ 
ing Carpal Tunnel Syndrome. 

AUTHORS Hansen PA, Mickelsen P, Robinson LR. 

CITATION Am JPhys Med Rehab. 2004;83(5):363-367. 

QUESTION Is the flick sign better than the Phalen or 
Tinel sign in identifying patients with hand symptoms 
who will have abnormal electrodiagnostic tests? 

DESIGN Prospective, consecutive enrollment. 

SETTING Electrodiagnostic clinic. 

PATIENTS All patients (n = 142) had upper limb symp¬ 
toms and were referred by their physicians for electrodiag¬ 
nostic testing to establish the diagnosis. For all patients, 
carpal tunnel syndrome (CTS) was part of the differential 
diagnosis. When patients had bilateral symptoms, only the 
more severely affected hand was evaluated for the study. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Standard assessment of the Phalen and Tinel signs. The flick sign 
was obtained by asking the patients how they relieved the dis¬ 
comfort in their hands and wrists when they were experiencing 
severe symptoms. Patients who demonstrated that they flick their 
hands (like shaking down a mercury thermometer) were consid¬ 
ered “positive.” The criterion standard was standard electrodiag¬ 
nostic testing, performed after the clinical evaluation. It is not 
clear whether the same examiner did the clinical examination 
and the electrodiagnostic testing. However, the electrodiagnostic 
testing was based on the quantitative output nerve latency. 

MAIN OUTCOME MEASURE 

Electrodiagnosis of CTS. 

MAIN RESULTS 

One hundred forty-two patients were studied, of whom 95 
had electrodiagnostic testing of CTS. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Prospective, consecutive enrollment among a 
group of referred patients for whom CTS was part of their 
differential diagnosis. The examination was done before the 
electrodiagnostic test. 

LIMITATIONS Electrodiagnosis may not have been blinded 
to the clinical findings, but the reporting of nerve conduction 
studies based on quantitative time rather than subjective 
time may make this less of a problem. 

The authors sum up the results best: “people ... [with hand 
symptoms] flick their hands” whether or not they have CTS. 
These data confirm the uselessness of the Phalen sign. Unfortu¬ 
nately, the combination of the flick or Tinel sign does not 
improve the diagnostic efficiency. The positive likelihood ratio 
for the Tinel was the highest, but the confidence interval is broad 
(see Table 10-7). 

Reviewed by David L. Simel, MD, MHS 


TITLE The Lumbrical Provocation Test in Subjects With 
Median Inclusive Paresthesia. 

AUTHORS Kaul AI, Carney ML, Kaul MP. 

CITATION Arch Phys Med Rehabil. 2001;82(7):935-937. 

TITLE Carpal Compression Test and Pressure Provocative 
Test in Veterans With Median-distribution Paresthesias. 

AUTHORS Kaul MP, Pagel KJ, Wheatley MJ, Dryden JD. 

CITATION Muscle Nerve. 2001;24(1):107-111. 

TITLE Lack of Predictive Power of the “Tethered” 
Median Stress Test in Suspected Carpal Tunnel Syndrome. 

AUTHORS Kaul MP, Pagel KJ, Dryden JD. 

CITATION Arch Phys Med Rehabil. 2000;81(7):348-350. 

QUESTION Does a physical examination maneuver 
meant to provoke symptoms predict patients who will 
have abnormal electrodiagnostic testing? Each study in 
this summary reports a different maneuver. 

DESIGN Prospective, consecutive. 

SETTING Electrodiagnostic laboratory of a Veterans 
Affairs medical center, Portland, Oregon. 

PATIENTS In each study, patients had median nerve 
symptoms, no previous surgery for carpal tunnel syn¬ 
drome, and no proximal neuropathy on the affected side. 
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Table 10-8 Likelihood Ratios of Provocation Tests for Carpal Tunnel Syndrome 




Test(n) 

Abnormal Electrodiagnostic 
Study Result 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Pressure provocation (134) 

77 

0.55 

0.68 

1.7 (1.1-2.7) 

0.66 (0.49-0.90) 

Carpal compression (135) 

80 

0.52 

0.56 

1.4 (0.94-2.1) 

0.77 (0.56-1.0) 

Lumbrical (fist) provocation (96) 

51 

0.37 

0.71 

1.3 (0.73-2.3) 

0.88 (0.66-1.2) 

“Tethered” median nerve stretch (112) 

58 

0.50 

0.59 

1.2 (0.8-1.9) 

0.85(0.59-1.2) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Positive test results induce or exacerbate the median nerve 
symptoms. 

The lumbrical provocation test is performed by having the 
patient hold a fist for 1 minute. (The lumbricales are the 4 
small muscles of the palm of the hand that flex the proximal 
phalanx and extend the 2 distal phalanges of each finger.) 

The “tethered” median nerve test creates a stretch of the 
median nerve by the examiner’s passively hyperextending the 
wrist and distal interphalangeal joint of the index finger. 

The carpal compression test is performed by applying 
moderate pressure with both thumbs over the transverse car¬ 
pal ligament. 

The pressure provocation test uses a 2.5-cm-wide pressure 
cuff applied to the patient’s wrist. The cuff is inflated to 50 
mm Hg, and then direct pressure is applied to bring the 
sphygmomanometer reading to 150 mm Hg. 

The electrodiagnostic studies were performed immediately 
after the provocation tests. When the provocation test result 
was positive, the patient was allowed to have the symptoms 
return to baseline before the electrodiagnostic studies. 


were based on a quantitative assessment. The results apply 
only to patients with median nerve symptoms. 

Even with the possibility that the provocation test affected 
the electrodiagnostic studies, this maneuver did not work to 
identify the patients with median nerve symptoms who 
would have an abnormal electrodiagnosis. As in all clinical 
diagnosis studies, it is important to recognize that the clini¬ 
cians included only patients with median nerve syndromes, 
something that can be evaluated at the bedside and is part of 
the recommended hand diagram. The provacation tests seem 
relatively useless as both the summary positive and negative 
likelihood ratios approach 1. Clinicians should stop trying to 
reproduce a patient’s median nerve symptoms because the 
response should not affect clinical decisions. 

Reviewed by David L. Slmel, MD, MHS 


MAIN OUTCOME MEASURE 

Electrodiagnostic studies. 


MAIN RESULTS 

See ble 10-8. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS All patients had median nerve symptoms. The 
provocation tests were applied before the electrodiagnostic 
tests. An additional strength is that patients with neck pain 
were also included, as long as they also had median nerve 
symptoms. 

LIMITATIONS The electrodiagnostic testing was performed 
blinded to the “tethered” median nerve test. It is not clear 
whether the electrodiagnostic tests were performed indepen¬ 
dently in the other 2 studies. However, the protocol for the 
electrodiagnostic procedure is described well and the results 
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trodiagnostic test result similar to those obtained from the 
TITLE A Diagnostic Algorithm for Carpal Tunnel Syn- initial phase of the study, 
drome Based on Bayes’ Theorem. 


AUTHORS O’Gradaigh D, Merry P. 

CITATION Rheumatology. 2000;39(9):1040-1041. 

QUESTION Can the results of a hand diagram, Phalen 
test, and Tinel test be applied sequentially? 

DESIGN Two-phase study. An initial study to determine 
the sensitivity and specificity of the findings may have been a 
convenience sample (n = 105 patients). The second phase 
assessed the sensitivity and specificity prospectively, but it is 
not stated whether these were consecutive patients (n = 42). 

SETTING Rheumatology clinic in the United Kingdom. 

PATIENTS Patients were referred because of a suspicion 
of carpal tunnel syndrome (CTS). 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients completed a hand diagram. Patients with classic or 
probable patterns were considered to have a positive test 
result. Phalen and Tinel tests were done by a single examiner. 

MAIN OUTCOME MEASURE 

Electrodiagnosis. 

MAIN RESULTS 

In the first set of 105 patients, 75 had abnormal electrodiag¬ 
nostic testing results. See : 1 0-9. 

For patients with a positive hand diagram result, the prob¬ 
ability of an abnormal electrodiagnostic test increased from 
79% to 92% when both the Tinel and Phalen test results were 
positive. Only 6 patients with a negative hand diagram result 
had an abnormal electrodiagnostic test result. Because the 
prevalence of an abnormal electrodiagnosis test result was so 
high, the posterior probability with a negative hand diagram 
result was still 33%. The second prospective phase of the 
study obtained posterior probabilities for an abnormal elec- 


Table 10-9 Likelihood Ratios of Tinel and Phalen Signs and the Hand 
Diagram for Carpal Tunnel Syndrome 

Test 

Sensitivity Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Tinel test 

0.55 0.72 

2.1 (1.2-4.0) 

0.60 (0.44-0.88) 

Hand diagram 

0.92 0.40 

1.5(1.2-2.2) 

0.20 (0.08-0.50) 

Phalen test 

0.72 0.53 

1.5 (1.1-2.4) 

0.52 (0.32-0.88) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Prospective assessment of sequentially con¬ 
ducting the Tinel and Phalen tests for patients after a hand 
diagram test. 

LIMITATIONS We infer that these patients were referred to 
the rheumatologist for therapeutic injections, accounting for 
the high prevalence of disease. The enrollment was not con¬ 
secutive patients. It is not clear whether the electrodiagnosis 
was done by the same person who performed the clinical 
examination. The prevalence of disease was much higher in 
this study than in many other studies. 

In a high-prevalence setting, the Phalen and Tinel tests will 
not demonstrate clinically important differences in the prob¬ 
ability of disease. We infer that these patients are not repre¬ 
sentative of all patients with CTS symptoms. However, the 
data support the concept that the Phalen or Tinel test will not 
alter the information from a hand diagram in a clinically 
important fashion. The authors suggest that patients with a 
high probability of CTS could be offered treatment (injection 
therapy) without nerve conduction tests. 

Reviewed by David L. Simel, MD, MHS 


TITLE Lack of Utility of Semmes-Weinstein Monofila¬ 
ment Testing in Suspected Carpal Tunnel Syndrome. 

AUTHORS Pagel KJ, Kaul MP, Dryden JD. 

CITATION Am JPhys Med Rehab. 2002;81(8):597-600. 

QUESTION Do 2 types of testing with a monofilament 
among patients who have median nerve symptoms iden¬ 
tify those who will have abnormal electrodiagnostic test 
results? 

DESIGN Prospective, consecutive enrollment. 

SETTING Electrodiagnostic laboratory of a Veterans 
Affairs hospital, Portland, Oregon. 

PATIENTS All patients (n = 113) had paresthesias of the 
median nerve. Patients with a previous carpal tunnel 
release operation, stroke, paresthesias in the fourth and 
fifth fingers only, or neurologic disease were excluded. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Two types of monofilament testing were done on the pad of 
each digit so that the filament bowed for 1.5 seconds. If the 
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monofilament was felt on at least 1 of 3 trials in each digital 
pad, the test result was considered normal. In the first proto¬ 
col, the patient had an abnormal response if there was no 
sensation or a sensation only with an increased stimulus 
(>2.83 monofilament) in any of the radial 3 digits. In the sec¬ 
ond protocol, patients were considered to have an abnormal 
response only if abnormal findings in the third finger were 
associated with normal findings in the fifth finger. The exam¬ 
iners used a monofilament testing kit with various sizes of fil¬ 
aments. The reference test was a standard electrodiagnostic 
study, blinded to the monofilament results. 

MAIN OUTCOME MEASURE 

Abnormal electrodiagnosis studies. 

MAIN RESULTS 

Of 113 patients, 60 (53%) had abnormal electrodiagnostic 
testing results. See Table 10-10. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Evidence that the test (monofilament) and refer¬ 
ence standard (electrodiagnosis) were applied independently. 
Clear guidelines on how to do the monofilament testing. 

LIMITATIONS There was some selection bias in that not 
only were the patients all referred to the electrodiagnostic 
laboratory, but they were also evaluated to confirm that they 
had symptoms in the median nerve distribution. However, 
this is the appropriate population for whom carpal tunnel 
syndrome [CTS] ought to be correctly considered. 

The authors conclude that the tests are worthless. Certainly, 
this appears true for the second method of monofilament test¬ 
ing (comparing the median nerve findings to the fifth finger). 
However, the ability of a normal response to monofilament 
testing in each of the first three digits decreases the likelihood 
of abnormal electrodiagnostic testing results in this population 


Table 10-10 Likelihood Ratio of Monofilament Testing for Carpal 
Tunnel Syndrome 


Test 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Decreased threshold 
or absent sensation in 
terminal digit pads 1, 

2, or 3 

0.98 

0.15 

1.2 

(1.0-1.3) 

0.11 

(0.02-0.64) 

Decreased threshold 
in terminal digit pad 3 
with normal terminal 
digit pad 5 

0.13 

0.88 

1.2 

(0.45-3.1) 

0.98 

(0.84-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


of patients. Why might the results be different (ie, better) than 
what was reported in the original Rational Clinical Examina¬ 
tion article? The study we initially used assessed only the 
response in the index finger rather than all 3 digits of the 
median nerve and found a sensitivity of only 59%. 1 Thus, 
requiring a normal response in all 3 digits would automatically 
improve the sensitivity. If the utility of a normal monofilament 
response can be validated, then this might be a useful test for 
identifying patients much less likely to have abnormal elec¬ 
trodiagnostic testing results. We would like to see this study 
repeated in a large population of patients with upper arm 
symptoms for whom CTS is considered. 

REFERENCE FOR THE EVIDENCE 

1. Buch-Jaeger N, Foucher G. Correlation of clinical signs with nerve con¬ 
duction tests in the diagnosis of carpal tunnel syndrome. / Hand Surg 
[Br]. 1994;19(6):720-724. 

Reviewed by David L. Simel, MD, MHS 


TITLE The Relationship Among Five Common Carpal 
Tunnel Syndrome Tests and the Severity of Carpal Tunnel 
Syndrome. 

AUTHORS Priganc VW, Henry SM. 

CITATION JHand Ther. 2003;16(3):225-236. 

QUESTION Among patients with carpal tunnel syn¬ 
drome, do the diagnostic tests separate patients with mild, 
moderate, or severe electrodiagnostic results? Are the test 
results reliable during a 2- to 7-day period? 

DESIGN Prospective. All tests were done before nerve 
conduction studies. The order of tests was randomized, 
except that the provocation tests were always done after 
the other maneuvers. The examiner waited 2 to 3 minutes 
between provocation tests for all the patients to return to 
baseline. Patients (n = 27) returned to the laboratory 2 to 
7 days after the first test to assess reliability. 

SETTING Patients referred from 3 neurology clinics in 
one community (Burlington, Vermont) for nerve conduc¬ 
tion studies. 

PATIENTS Patients scheduled for nerve conduction 
studies (n = 206) were contacted and invited to partici¬ 
pate. Patients were excluded if they had systemic periph¬ 
eral neuropathy, previous carpal tunnel release, proximal 
median nerve compression, or foot numbness not attrib¬ 
utable to an orthopedic problem. Sixty-six patients (95 
hands) were ultimately qualified for the study because the 
study reported only those with abnormal electrodiagnos¬ 
tic results. 
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DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Phalen, Tinel, and carpal compression tests (examiners apply 
both of their thumbs to the patient’s transverse carpal liga¬ 
ment), and Katz hand diagram. All patients had a nerve con¬ 
duction test, along with a carpal tunnel outcomes assessment 
test that had scales for symptom severity and functional sta¬ 
tus. The tests were applied without knowledge of the elec¬ 
trodiagnostic results. 

MAIN OUTCOME MEASURES 

According to preestablished criteria, the nerve conduction 
quantitative results were classified into mild (55 hands), 
moderate (23 hands), or severe (17 hands) outcomes. 
Reliability was assessed during a 2- to 7-day follow-up 
period. 

MAIN RESULTS 

The Katz hand diagram was the most reliable finding 
( ble 10 1 ). The authors reported that only the Phalen 
test showed an association with the nerve conduction 
severity (P < .05). Our reanalysis of the data shows mini¬ 
mal significance (P = .05). In a logistic model, the odds 
ratio is 2.6 (95% confidence interval, 0.98-6.9) and the 
accuracy of the model as displayed by the area under the 
receiver operating characteristic curve is only 0.50 (a 
measure of accuracy). 


Table 10-11 Reliability of Various Tests for Carpal Tunnel Syndrome 

Test 

k (95% Cl) 

Katz hand diagram 

0.95(0.84-1.0) 

Carpal compression 

0.63 (0.33-0.92) 

Phalen 

0.58 (0.22-0.94) 


Abbreviation: Cl, confidence interval. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS A different type of study design to see whether 
the tests correlate with the degree of abnormality, rather than 
just the presence of carpal tunnel syndrome. 

LIMITATIONS The results can be applied only to patients 
with known carpal tunnel syndrome. Thus, they are of lim¬ 
ited value in the primary care clinic. 

The goal of identifying patients who will have abnormal 
nerve conduction results differs from the goal of using the 
physical examination results to identify those who will have 
mild, moderate, or severely abnormal electrodiagnostic 
results. These results suggest that the physical examination 
findings did not help much with categorizing the severity. 
The intrarater reliability for these findings is reassuring in 
that the results are similar during a 2- to 7-day period. 

Reviewed by David L. Simel, MD, MHS 
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CHAPTER 


CLINICAL SCENARIO 


Does This Patient Have 

Abnormal Central 
Venous Pressure? 

Deborah J. Cook, MD, FRCPC, MSc (Epid) 
David L. Simel, MD, MHS 


A 65-year-old woman has had dyspnea for 2 months. She 
has had to give up her hobby of hiking and is now short of 
breath after climbing even 1 flight of stairs. Her dyspnea is 
sometimes worse at night. She has no chest pain, cough, or 
sputum, and the result of systems review is otherwise neg¬ 
ative. On physical examination, her blood pressure is 135/ 
90 mm Hg, and she has a regular cardiac rhythm at 72/ 
min. You turn your attention to the jugular veins and next 
ask yourself, “Does this patient have abnormal central 
venous pressure (CVP)?” 


WHY IS THIS QUESTION IMPORTANT? 


Evaluation of the jugular venous pulse provides important 
information about pressure and other hemodynamic events in 
the right atrium. 1 ' 3 The jugular venous pulse provides a useful 
estimate of CVP and thus the patient’s intravascular volume 
status. Inspection of the waveforms can assist the diagnosis of 
several tricuspid and pulmonic valvular abnormalities. More¬ 
over, accurate assessment of CVP by physical examination may 
obviate the necessity for invasive hemodynamic monitoring. 

Accordingly, the clinical evaluation of jugular venous pres¬ 
sure (JVP) and waveforms is useful whenever intravascular 
volume status, ventricular function, valvular disease, or peri¬ 
cardial constriction is in question. Proficiency in this exami¬ 
nation is especially important, given that it may be difficult, 
if not impossible, to identify venous pulsation in patients 
with low CVP, 4 in patients receiving mechanical ventilation, 4,5 
in patients with short or fat necks, and in some patients who 
have conditions causing wide swings in CVP during the res¬ 
piratory cycle (eg, during acute asthma). 


ANATOMIC AND PHYSIOLOGIC ORIGINS 
OF THE JUGULAR VENOUS PRESSURE 

Because the jugular veins act as manometer tubes for the 
right atrium, they display changes in blood flow and pressure 
caused by right atrial filling, contraction, and emptying. In 
general, the jugular vein with the most distinct, undamped 
waveform is likely to most accurately reflect right atrial pres¬ 
sure. Because the right internal jugular vein is directly in line 
with the right atrium, thereby favoring an unimpeded trans¬ 
mission of atrial pulsations and pressure, it is the preferred 
site for examining the jugular venous pulse. 

Direct measurements of CVP according to the left jugular 
veins tend to be higher than those on the right, but the correla¬ 
tion between the 2 is high. 6 The discrepancy may reflect the 
fact that both the innominate vein and the left internal jugular 
vein can be compressed by a variety of normal or abnormal 
structures. 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 
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Although the internal jugular vein lies deep to the sterno¬ 
cleidomastoid muscle and may not always be visible as a dis¬ 
crete structure, its pulsation usually is transmitted to the 
overlying skin. Normally, the CVP pulsation moves toward 
the heart during inspiration because of a sudden increase in 
venous return to the right side of the heart. 

The external jugular veins, although sometimes easier to see, 
may be constricted as they pass through the fascial planes of the 
neck and thus may not accurately reflect right atrial pressures. 
However, in one study, venous pressures measured in the exter¬ 
nal jugular vein accurately reflected right atrial pressures during 
anesthesia and with controlled or spontaneous ventilation. 7 Pos¬ 
itive-pressure ventilation caused regular, periodic changes to 
occur in venous return, which resulted in similar phasic changes 
in right atrial and external jugular pressures. The only signifi¬ 
cant difference was the greater right atrial pressure variation 
during mechanical ventilation, although the maximal venous 
pressures at the 2 sites were nearly identical. 7 



Figure 11-1 Venous Pulsation in the Neck Corresponds With the 
Electrocardiogram 

Simultaneous recording of an electrocardiogram (top tracing) and jugular venous 
pressure waves (lower tracing). The a wave reflects right atrial contraction just 
before the first heart sound and carotid pulse; atrial relaxation is reflected by the x 
descent; c wave reflects the bulging of the tricuspid valve into the right atrium 
during ventricular isovolumetric contraction; x j descent reflects subsequent atrial 
relaxation; vwave reflects the closure of tricuspid valve and subsequent disten- 


tion of the right atrium; and y descent reflects the right atrium emptying after the 
opening of the tricuspid valve. 


Table 11 -1 Abnormalities of the Venous Waveforms 

Waveform 

Cardiac Condition 

Absent a wave 

Atrial fibrillation, sinus tachycardia 

Flutter waves 

Atrial flutter 

Prominent a waves 

First-degree atrioventricular block 

Large a waves 

Tricuspid stenosis, right atrial myxoma, pulmo¬ 
nary hypertension, pulmonic stenosis 

Cannon a waves 

Atrioventricular dissociation, ventricular tachycardia 

Absent/descent 

Tricuspid regurgitation 

Prominent/descent 

Conditions causing enlarged a waves 

Large cv waves 

Tricuspid regurgitation, constrictive pericarditis 

Slow y descent 

Tricuspid stenosis, right atrial myxoma 

Rapid y descent 

Constrictive pericarditis, severe right heart fail¬ 
ure, tricuspid regurgitation, atrial septal defect 

Absent/descent 

Cardiac tamponade 


Among critically ill patients, one group of investigators found 
jugular venous pulsations sufficiently obvious for examination 
only 20% of the time, 8 whereas another group was able to esti¬ 
mate CVP in 84% of critically ill patients. 4 In the former study, 
although external jugular pulsations were visible in all patients, 
clinicians’ estimates of venous pressure according to physical 
examination were within 2 cm of CVP determined by central 
venous catheter only 47% of the time. 

The evaluation of individual components of the venous 
pulse in health and disease lies outside the focus of this over¬ 
view but can be summarized as follows. 


ANALYSIS OF THE VENOUS WAVEFORM 

The normal JVP reflects phasic pressure changes in the right 
atrium and consists of 3 positive waves and 3 negative 
troughs (Figure 111). Although these pressure changes can 
be recorded with pressure monitors, they are not always 
appreciable on clinical examination of the jugular pulse. Aus¬ 
cultation of the heart or simultaneous palpation of the left 
carotid artery may aid the examiner in relating the pattern of 
venous pulsations to the cardiac cycle. 

Taken in sequence, right atrial contraction is reflected by the 
dominant positive a wave and occurs just before the first heart 
sound and carotid pulse. Atrial relaxation is reflected by the 
first negative trough, the x descent. The second positive wave is 
produced by the bulging of the tricuspid valve into the right 
atrium during ventricular isovolumetric contraction; this is 
called the c wave. Subsequent atrial relaxation creates the most 
dominant descent, the x x descent. When the tricuspid valve 
closes, subsequent distention of the right atrium creates the v 
wave, which occurs just after the arterial pulse. Finally, after 
the opening of the tricuspid valve, the right atrium empties, 
resulting in the y descent. 

Various cardiac conditions are associated with waveform 
abnormalities. A few of the most common include the 
absence of a waves in atrial fibrillation, large cv waves in tri¬ 
cuspid regurgitation, the slow y descent of tricuspid stenosis, 
and the brisk y descent seen in constrictive pericarditis. Table 
11 - shows a summary of abnormal venous waveforms and 
the conditions in which they occur. Remember, it is not 
always possible to see each of these waves and descents. 

HOW TO EXAMINE THE NECK VEINS 

The right internal jugular vein should be used to assess CVP 
for several reasons. It is in direct line with the right atrium, 
thereby favoring unimpeded transmission of atrial pulsations 
and pressure. Clinical assessment of CVP on the left may be 
marginally higher than that on the right. Finally, constricted 
or tortuous external jugular veins may introduce inaccuracy. 

Positioning 

Proper positioning is crucial for examination of the neck veins. 
The patient’s head is supported to relax the neck muscles, and the 
trunk is inclined at an angle that brings the top of the column of 
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blood in the internal jugular vein to a level above the clavicle but 
below the angle of the jaw; in normal subjects, this positioning is 
accomplished at 30 to 45 degrees above the horizontal. In 
patients with elevated venous pressure, it often is necessary to ele¬ 
vate the trunk beyond 45 degrees, and patients with severe 
venous congestion may have to stand up and inspire deeply to 
bring the meniscus down into view. In some cases, the level of 
venous pulsation will be seen behind the angle of the jaw or will 
appear to move the earlobes. If the pressure in the internal jugu¬ 
lar vein is high, venous pulsations will be lost in the completely 
full vein, and the high venous pressure may be overlooked. 

Conversely, patients with low CVP may have to be positioned 
at 0 to 30 degrees. When CVP is low, the neck veins will be 
empty, and pulsations may not be visible even when the patient 
is horizontal. 

Tangential light often improves the detection of the venous 
pulse. When ambient light is insufficient for this purpose, a pen- 
light, directed away from the examiner’s eyes, may be useful. 

Distinguishing Arterial (Carotid) 

From Venous (Jugular) Pulsation 

Difficulty in distinguishing between the carotid arterial pulse and 
jugular venous pulse may be overcome by noting several differ¬ 
entiating features (Table 1 1-2). 9 First, the venous pulsation is dif¬ 
fuse, usually has 2 waves, and the upward deflection is slow. In 
contrast, the carotid pulse is a fast, well-localized, single, outward 
deflection. Second, venous pulsations (unless the venous pres¬ 
sure is extremely high) diminish toward the clavicle or disappear 
beneath it as the patient sits up or stands and advance toward the 
angle of the jaw as the patient reclines; carotid pulses generally do 
not vary with position. Third, in the absence of intrathoracic dis¬ 
ease, the top of the venous wave descends during inspiration 
(because of increasingly negative intrathoracic pressure). How¬ 
ever, the visible carotid pulse does not vary with the respiratory 
cycle, except during pulsus paradoxus. Fourth, the JVP is non- 
palpable, and gentle pressure applied by the examiner’s finger to 
the root of the neck above the clavicle will obstruct the vein, fill 
its distal segment, and obliterate the venous pulse. However, the 
carotid pulse is almost always palpable, usually striking the exam¬ 
ining finger with considerable force. Finally, sustained pressure 
on the abdomen (the abdominojugular reflux test, to be 
described later) usually will cause even a normal venous pulse to 
increase briefly but will have no effect on the carotid pulse. 

Estimation of Central Venous Pressure 

The level of venous pressure is estimated by identifying the high¬ 
est point of oscillation of the internal jugular vein (which usually 
occurs during the expiratory phase of respiration). This level 
must then be related to the middle of the right atrium, where 
venous pressure is, by convention, zero. Because the latter site is 
inaccessible on clinical examination, an accessible, reliable land¬ 
mark is substituted: the sternal angle of Louis. This easily pal¬ 
pated landmark, found at the junction of the manubrium with 
the body of the sternum, lies 5 cm above the middle of the right 
atrium (for all practical purposes) in reclining patients of nor¬ 
mal size and shape, regardless of the angle at which they are 
reclining. 


Table 11-2 Distinguishing the Carotid Arterial From Jugular 

Venous Pulsation 

Characteristic 

Venous Pulse 

Carotid Pulse 

Waveform 

Diffuse biphasic 

Single sharp 

Positional change 

Varies with position 

No variation 

Respiratory variation 

Height falls on inspiration 

No variation 

Effect of palpation 

Wave nonpalpable, pressure 
obliterates pulse, vein fills 

Pulse palpable, 
not compressible 

Abdominal pressure 

Displaces pulse upward 

Pulse unchanged 


Using the sternal angle as the reference point, the vertical 
distance (in centimeters) to the top of the jugular venous 
wave can be determined (Figure 11-2) and reported as the 
JVP; thus, JVP is 5 cm less than CVP. 

When the patient is positioned at 45 degrees above the hori¬ 
zontal, the clavicle lies a vertical distance of about 2 cm above 
the sternal angle, and only CVPs of at least 7 cm will be 
observed. 10 Because the normal CVP in adults is 5 cm, the top 
of their venous pressure column lies at their sternal angle, 2 cm 
below their lowest visible point in a patient at 45 degrees, and 
will only appear as the patient reclines toward the horizontal. 
The upper limit of normal for CVP is 9 cm H 2 0, which pro¬ 
duces a JVP extending 4 cm above the sternal angle. 1 (Note: 
The Update that follows this section revealed that physicians 
underestimate the value of the central venous pressure from 
the jugular vein meniscus. Part of the underestimate may 
result from variability in the depth measured from the sternal 
notch to the mid-right atrium. This can be partially corrected 
by accepting a JVP of 3 cm or more as elevated.) 

Estimating CVP may be done as follows: Identify the highest 
point of pulsation in the internal jugular vein; find the sternal 
angle of Louis; from the sternal angle, measure the vertical dis¬ 
tance to the top of the pulsation in centimeters; and report as 
“the JVP is xx cm.” 

Alternative methods of assessing CVP exist but have not 
been validated. For example, with a reclining patient, the clini¬ 
cian can inspect the veins of the back of the hand as the arm is 
slowly, passively raised; the level at which the veins collapse can 
then be related to the angle of Louis. This method may give 
false high readings with local obstruction and peripheral 
venous constriction, so it is not recommended. 

Abnormal Central Venous Pressure 

Elevated JVP reflects an increase in CVP. This increase can 
be due to increased right ventricular diastolic pressure (eg, 
right ventricular failure or infarction, pulmonary hyperten¬ 
sion, or pulmonic stenosis), obstruction to right ventricular 
inflow (eg, tricuspid stenosis, right atrial myxoma, or con¬ 
strictive pericarditis), hypervolemia, or superior vena cava 
obstruction. 

Decreased JVP reflects a decreased or a low CVP. Low CVP 
may be due to intravascular volume depletion from gas¬ 
trointestinal losses (vomiting or diarrhea), urinary losses 
(diuretics, uncontrolled diabetes mellitus, or diabetes insipi¬ 
dus), third-space fluid losses, and hypovolemic shock. 
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Positive indication: 

Jugular venous pulse > 3 cm above the 
sternal notch, or a sustained jugular 
venous pulse of > 4 cm with abdominal 
compression, suggests a 3- to 4-fold 
increase in the likelihood that the central 
venous pressure is elevated. 


Sternal angle 
Louis) 

Sternum 


C.Lynm 


Examiner places the patient in a 
reclined position and puts the base of 
the ruler at the sternal angle. 


“JVP Ruler” Calculation of CVP = JVP + 5 cm 


Elevated CVP: JVP > 3 cm 


Normal CVP meniscus level 


Depth to right atrium: 5 cm 


\\\ 

\\ 

Height of right jugular vein 


Figure 11-2 Estimation of Central Venous Pressure From the Jugular Venous Pulse 

At any patient position, the top of the jugular vein meniscus is identified. The jugular venous pulse measurement is sighted from the height read from a ruler 
placed vertically over the sternal notch. The traditional assumption has been that the CVP is the JVP + 5 cm. However, the Update for this article showed that 
physicians tend to underestimate the CVP and the assumption of a 5-cm depth from the sternal notch to the right atrium is probably not valid. Thus, this figure 
has been updated to reflect current recommendations that a JVP > 3 cm suggests an elevated CVP. Abbreviations: CVP, central venous pressure; JVP, jugular 
venous pressure. 


Abdominojugular Reflux Test (Hepatojugular Reflux) 

The abdominojugular reflux test consists of observing JVP 
before, during, and after abdominal compression. The 
increase in jugular pressure that follows abdominal compres¬ 
sion is believed to be a consequence of blood shifting from 
abdominal veins into the right atrium. Pasteur first described 
the hepatojugular reflux in 1885. 11 Now, this bedside test is 
used to confirm the presence of right ventricular failure or 
reduced right ventricular compliance. Like all clinical tests, it 
is most reliable when performed in a standardized fashion. 

The patient is instructed to relax and breathe normally 
through an open mouth (to avoid the false-positive increase 
in jugular pressure that accompanies the Valsalva maneuver). 
Firm pressure is then applied with the palm of the hand to 
the midabdomen for 15 to 30 seconds (abdominal compres¬ 
sion for 1 minute, as has previously been described, is not 
required). 10,12,13 This pressure should approximate 20 to 35 
mm Hg when an unrolled bladder of a standard adult blood 
pressure cuff, partially inflated with 6 full bulb compressions, 
is placed between the examiner’s hand and the patient’s 
abdomen. 10,13 Pressure directly over the liver, as was originally 
described, 1,2,12,14 appears to be unnecessary. 13,15 Therefore, des¬ 
ignation of the test as abdominojugular reflux, rather than 
hepatojugular reflux is more appropriate. If pain is produced 
by this maneuver, or if the patient strains (Valsalva), the test 
becomes falsely positive. Either instruct the patient to open 
his or her mouth and breathe slowly or try a trial run, which 
is sometimes useful to demonstrate to the patient the force 
that will be applied over the abdomen. 


Healthy individuals may exhibit one of 3 responses to 
abdominal compression: no change in JVP; a transient (few 
seconds) increase of more than 4 cm that returns to its 
former level or near the baseline before 10 seconds, with little 
or no decrease when abdominal pressure is released; or an 
increase of more than 3 cm sustained throughout compres¬ 
sion. 10,13 A positive abdominojugular test result occurs when 
abdominal compression causes a sustained increase in JVP of 
greater than or equal to 4 cm. 

Kussmaul Sign 

The JVP normally decreases during inspiration. The Kussmaul 
sign is the paradoxic increase in the height of JVP that occurs 
during inspiration. It can be explained by a heart that is unable 
to accommodate the increased venous return that accompa¬ 
nies the inspiratory decrease in intrathoracic pressure. 
Although classically described in constrictive pericarditis, the 
most common contemporary cause of the Kussmaul sign is 
severe right-sided heart failure, regardless of etiology. Other 
causes include myocardial restrictive disease such as amyloido¬ 
sis, tricuspid stenosis, and superior vena cava syndrome. 

PRECISION OF THE CLINICAL ASSESSMENT 
OF CENTRAL VENOUS PRESSURE 

When 2 clinicians examine the same patient once (interob¬ 
server variation), and even when 1 clinician examines the 
same patient twice (intraobserver variation), estimates of 
CVP commonly vary by up to 7 cm. 4 Final-year medical stu- 
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dents, first- and second-year medical residents, and attend¬ 
ing physicians examined the same 50 intensive care unit 
patients (but were blinded to simultaneous CVP manome¬ 
try) and estimated these patients’ CVPs as low (<5 cm), nor¬ 
mal (5-10 cm), or high (>10 cm). 4 Agreement between 
students and residents was substantial (k, a measure of 
chance-corrected agreement, was 0.65), agreement between 
students and attending physicians was moderate (k = 0.56), 
and agreement between residents and attending physicians 
was modest (k = 0.30). 

Suggested causes for disagreement include variations in 
the positioning of patients, poor ambient lighting, difficulty 
in distinguishing carotid from venous pulsations, biological 
variation in CVP with the phases of respiration, and the 
effects of vasoactive medication and diuretics. 

The precision of the abdominojugular reflux test has 
not been reported, but its results will vary with the force 
of abdominal compression. Different investigators suggest 
different forces: Ducas et al 10 compressed a semi-inflated 
blood pressure cuff placed in the middle of the abdomen 
to 35 mm Hg (equivalent to a weight of approximately 8 
kg), whereas Ewy 13 applied a pressure of approximately 20 
mm Hg. 

Although no validated methods for improving precision 
in determining JVP have been reported, it seems prudent to 
standardize the procedure as described herein, encourage 
normal breathing, rehearse abdominal compression until 
the Valsalva maneuver is avoided, and gradually increase 
abdominal compression during a few seconds. 16 Even when 
the Valsalva maneuver is avoided, there is still a small varia¬ 
tion in JVP with the phases of breathing. 17 

ACCURACY OF THE CLINICAL ASSESSMENT 
OF CENTRAL VENOUS PRESSURE 

We describe 3 studies that have reported the relation 
between clinical assessments of CVP and the gold standard 
of simultaneous pressure measurements through an ind¬ 
welling central venous catheter. 4 ' 5,18 When the clinical 
assessment was reported as low, normal, or high, the pooled 
overall accuracy was 56%. In one study, 4 venous pressure 
was assessed in each of 50 intensive care unit patients by 
one of 3 intensive care unit attending physicians, one of 6 
medical residents, and one of 6 medical students. Although 
all groups tended to underestimate venous pressure, only 
the residents did so to a statistically significant degree. The 
correlation coefficient between clinical assessment and cen¬ 
tral line measured CVP was highest for medical students 
(0.74), slightly lower for residents (0.71), and lowest for 
staff physicians (0.65), and these correlations improved 
slightly when patients receiving mechanical ventilation 
were excluded. The students’ data from this study 4 (Table 
11-3) display the results for 2 clinical questions: “Is the 
patient’s true CVP low?” and “Is the patient’s true CVP 
high?” 2 Despite small numbers of participants, it is appar¬ 
ent that a clinically assessed low CVP increases the likeli¬ 
hood by about 3-fold that the measured CVP will be low; 


Table 11-3 Measured Central Venous Pressure 3 

Is the CVP Low? 

Clinical 

Low, 

Normal or High, 

LR That CVP 

Assessment 

CVP <5 cm 

CVP >5 cm 

Is Low (95% Cl) 

CVP low 

3 

5 

3.4 (1.0-11) 

CVP normal 

4 

22 

1.0 (0.5-2.1) 

CVP high 

0 

13 

0(0-1.5) 

Is the CVP High? 

Clinical 

High, 

Normal or Low, 

LR That CVP 

Assessment 

CVP >10 cm 

CVP <5 cm 

Is High (95% Cl) 

CVP high 

10 

3 

4.1 (1.3-13) 

CVP normal 

10 

16 

0.8 (0.5-1.3) 

CVP low 

1 

7 

0.2(0.02-1.3) 


Abbreviations: CVP, central venous pressure; Cl, confidence interval; LR, likelihood ratio. 
“Adapted from Cook. 4 


no patient clinically assessed as having a high CVP had a 
low measured CVP. Similar results hold when the clinician 
considers whether the patient has increased CVP. Clinical 
assessments of a high CVP increase the likelihood by about 
4-fold that the measured CVP will be high; conversely, clin¬ 
ical assessments of a low CVP make the probability of find¬ 
ing a high measured CVP extremely unlikely (likelihood 
ratio [LR], 0.2). The data demonstrate that clinical assess¬ 
ments of a normal CVP are truly indeterminate, with LRs 
approaching 1; such estimates provide no information 
because they neither increase nor decrease the probability 
of an abnormal CVP. 19 Aside from less observer variation, 
the data suggest that CVP estimates achieve greater accu¬ 
racy among patients breathing spontaneously. However, the 
relatively small patient population creates an opportunity 
for further studies on how mechanical ventilatory assis¬ 
tance affects clinical assessment of CVP. 

In a study of 62 patients undergoing right-sided heart 
catheterization, 5 an attending physician, a critical care fellow, 
a medical resident, an intern, and a student each predicted 
whether 4 hemodynamic variables, including CVP, were low, 
normal, high, or very high. The sensitivity of the clinical 
examination for identifying low (<0 mm Hg), normal (0-7 
mm Hg), or high (>7 mm Hg) CVP was 0.33, 0.33, and 0.49, 
respectively (10 cm of H 2 0 is equivalent to 7.5 mm Hg). The 
specificity of the clinical examination for identifying low, 
normal, or high CVP was 0.73, 0.62, and 0.76, respectively. 
Predictions of right atrial pressure (CVP) were more accurate 
in patients with low cardiac indexes (<2.2 L/min) and high 
pulmonary artery wedge pressures (>18 mm Hg) and less 
accurate among patients in coma or receiving mechanical 
ventilation. Accuracy was not improved in cases in which 
precision (interobserver agreement) among the examiners 
was high. 

In a third study, Eisenberg et al 18 compared clinical 
assessments with pulmonary artery catheter readings in 
97 critically ill patients. The physicians caring for these 
patients were asked to predict whether CVP was less than 
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2, 2 through 6, or greater than 6 mm Hg; whether cardiac 
output was less than 4.5, 4.5 through 7.5, or greater than 
7.5 L/min; whether systemic vascular resistance was 1100, 
1100 through 1300, or greater than 1300 (dyn x s)/cm 5 ; 
and whether pulmonary artery wedge pressure was less 
than 10, 10 through 14, 15 through 19, or greater than or 
equal to 20 mm Hg. Physicians correctly predicted the 
patients’ CVP only 55% of the time and cardiac index, sys¬ 
temic vascular resistance, and pulmonary artery wedge 
pressure only 51%, 44%, and 30% of the time, respec¬ 
tively. CVP was more frequently underestimated (27%) 
than overestimated (17%). 

Although the abdominojugular reflux test is an insensi¬ 
tive way to diagnose congestive heart failure, the specificity 
of this test is high. 20 - 21 Moreover, the positive LRs (6.4 when 
the strict criteria are used and 6.0 when emergency physi¬ 
cian judgment is used) indicate that this is a useful bedside 
test (Table 11-4). 

IMPROVING CLINICAL EXAMINATION 
OF THE JUGULAR VEINS 

Examining patients with indwelling central venous catheters 
provides the clinician with an opportunity for calibrating 
and periodically testing clinical skills for evaluating CVP. Of 
course, the examination should be performed blind to the 
catheter reading. If the examination is also conducted blind 
to other patient data, interpretation of waveforms can be 
compared to electrocardiograms and other data from cardiac 
investigations. Learning aids such as pocket cards displaying 
the normal jugular pulsations may also be helpful. Assess¬ 
ment of JVP in patients with tachycardia, irregular cardiac 
rhythms, and rapid and deep respirations and those requir¬ 
ing mechanical ventilation provide a challenge for even sea¬ 
soned clinicians. 22 


Table 11-4 Sensitivity and Specificity of the Abdominojugular Reflux 
in Diagnosing Congestive Heart Failure 3 

Abdominojugular Reflux CHF No CHF Total 

By Explicit Criteria for the Abdominojugular Reflux Response 

Present 5 1 6 

Absent 16 26 42~ 

Tbtal 21 27 48 

By Emergency Physician’s Judgment of the 
Abdominojugular Reflux Response 

Present 4 2 6 

Absent 8 34 42 

Tbtal 12 36 48~ 

Abbreviations: CHF, congestive heart failure; Cl, confidence interval; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 

“Adapted from Marantz et al. 20 For diagnosing by criteria, sensitivity = 0.24; specific¬ 
ity = 0.96; LR+ = 6.4 (95% Cl, 0.8-51); and LR- = 0.8 (95% Cl, 0.6-1.0). For diag¬ 
nosing by emergency physicians, sensitivity = 0.33; specificity = 0.94; LR+ = 6.0 
(95% Cl, 1.3-29); and LR- = 0.7 (95% Cl, 0.5-1.1). 


THE BOTTOM LINE 

According to the results of this overview, the following rec¬ 
ommendations apply to the clinical assessment of JVP. First, 
in a well-lit room, position the patient at an angle such that 
the meniscus of blood in the right jugular vein is brought 
into vision (usually an angle of 30 to 45 degrees from the 
horizontal). To identify the top of the meniscus, it may be 
necessary to raise or lower this angle. Second, distinguish the 
jugular venous waveform from the carotid pulsation by 
remembering the following: The venous waveform is diffuse 
and biphasic, varies with position and respiration, is nonpal- 
pable, and may be displaced upward during abdominal pres¬ 
sure. In contrast, the carotid pulsation is single, sharp, and 
palpable; does not vary with position or respiration; and is 
unchanged with abdominal pressure. Third, measure the ver¬ 
tical distance in centimeters from the sternal angle of Louis 
to the top of the column of blood in the jugular vein. The 
upper limit of normal is approximately 4 cm (Note: The 
Update to this article recommends that the clinician consider 
a value of 3 cm or more as elevated). Armed with evidence 
about how to examine and interpret the clinical assessment 
of CVP, you can now answer the question of whether the 
patient presented at the beginning of this article, and subse¬ 
quent patients you care for, have abnormal CVP. 
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UPDATE: Central Venous Pressure 



Prepared by David L. Simel, MD, MHS 
Reviewed by Deborah Cook, MD, MSc (Epid) 


CLINICAL SCENARIO 


A 48-year-old man who has had 2 myocardial infarctions 
is having trouble sleeping. He claims shortness of breath 
while supine but does not notice any ankle edema. The 
lungs are clear, whereas the cardiac evaluation reveals an 
S 4 but no S 3 heart sound. There is a short systolic murmur 
along the left sternal border. He has no peripheral edema. 
You look at his large, thick neck and have no confidence 
that you will be able to assess the neck veins. 

UPDATED SUMMARY ON ABNORMAL 
CENTRAL VENOUS PRESSURE 

Original Review 

Cook DJ, Simel DL. Does this patient have abnormal central 
venous pressure? JAMA. 1996;275(8):630-634. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series articles in MEDLINE, 
combined with the search terms “central venous pressure,” 
“exp jugular veins,” “exp venous pressure,” and “abdomino¬ 
jugular reflux,” limited to human and English-language 
articles published from 1995 to August 2004. We excluded 
case reports, leaving 189 titles for review. Of these citations, 
13 were applicable and were retrieved to determine whether 
they had sensitivity, specificity, or likelihood ratio (LR) data 
for the use of the clinical estimation of the central venous 
pressure (CVP) or jugular venous pressure (JVP) for identi¬ 
fying patients with high or low CVP measured by a refer¬ 
ence standard. Only 1 study provided new data. Through 
review of references in the 13 articles, we found 1 article 
published before 1995 that we had not included in our orig¬ 
inal review. 

NEW FINDINGS 

• A JVP 3 cm above the sternal angle, in any patient position, 
suggests an elevated CVP. 1 

• Clinicians systematically underestimate the CVP when 
using the JVP. 2 The distance from the sternal notch to the 


right atrium may be closer to 8 cm rather than the tradi¬ 
tionally assumed value of 5 cm. 1 

Details of the Update 

Two analyses from the same randomized treatment trial for 
heart failure demonstrate the usefulness of assessing for an 
elevated JVP. 3,4 These studies analyzed prospectively col¬ 
lected data by study investigators (cardiologists) from a few 
thousand patients. The cardiologists answered a simple 
question: Is the JVP elevated? The assessment was not con¬ 
firmed with direct measurement of the CVP, but the associ¬ 
ation with important outcomes suggests that assessing the 
CVP as elevated or not elevated is useful. 

A nonsystematic review of the venous pressure provides 
additional information for those who believe that the assess¬ 
ment of the JVP is either too difficult or lacks correlation 
with the CVP. McGee 5 describes many features that explain 
the discrepancy between the clinical estimation of CVP and 
the actual CVP measurement. What is striking about 
McGee’s 3 findings and those from empirical studies (see 
reviews of individual studies) is that the discrepancies are not 
random, but systematic. There is a distinct and reproducible 
bias that leads clinicians to underestimate the true CVP. 
The editorial accompanying McGee’s 3 review asserts that the 
“... major limitation to current use [of JVP assessment] is 
lack of practice,” 6 a statement also emphasized by others. 7 
McGee 5 suggests, despite the factors leading to disagreements 
between the clinical assessment of JVP and the measurement 
of CVP, that finding the JVP more than 3 cm above the ster¬ 
nal angle indicates an abnormally high CVP. The empirical 
data support the suggestion. 

The information about the sensitivity and specificity of 
the JVP assessment to identify patients with low CVP is 
scant. We found no additional studies. The original data in 
The Rational Clinical Examination article suggest an LR of 
3.4 (95% confidence interval, 1-9.9) when the clinical ques¬ 
tion is whether the patient has a low CVP and the clinician 
finds that the JVP is not observed (CVP < 5 cm). The corre¬ 
lation between low clinically assessed CVP with the invasive 
assessment is better than it is for a population of patients 
with high CVP. Although this makes sense, our confidence 
around this is low, given the small numbers of patients 
studied. 
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The abdominojugular reflux test might be an alternative or 
complementary test to the JVP assessment in patients with 
significantly impaired left ventricular failure. A 2000 system¬ 
atic review 8 identified no additional studies evaluating 
abdominojugular reflux. We found no other studies from 
1995 to 2004, although we did identify a study that we had 
not included in the original review. In patients with impaired 
left ventricular function, the reproducibility of the abdomi¬ 
nojugular reflux appears to be excellent (k = 0.92). We cau¬ 
tion examiners that they must use same techniques as those 
used in these studies to achieve similar results. 9 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

We used the data from the original manuscript and now provide 
summary estimates for high CVP; we can also provide summary 
estimates for the abdominojugular reflux ( ). We cre¬ 

ated a new figure to demonstrate the assessment of JVP, indicat¬ 
ing the newer recommended threshold of > 3 cm for identifying 
patients with an elevated central venous pressure. 

CHANGES IN THE REFERENCE STANDARD 

There have been no changes in the reference standard. 

RESULTS OF LITERATURE REVIEW 

Three studies allow us to combine measures for the assess¬ 
ment of a high CVP. Two of the 3 studies evaluated 


Table 11-5 Likelihood Ratios for the Abdominojugular Reflux Test and 
Clinical Assessments of the Central Venous Pressure 

Finding (No. of 
Combined Studies) 

Question 

LR+ (95% Cl) 

LR- (95% Cl) 

Abdominojugular 
reflux (2) 910a 

Would the mea¬ 
sured CVP be high? 

4.4 (1.8-10) 

0.48(0.22-1.1) 

Clinically assessed 
high CVP (a) 9 ' 11 ' 1211 

Would the mea¬ 
sured CVP be high? 

3.1 (1.6-6.0) 

0.50 (0.37-0.68) 

Clinically assessed 
low CVP (I) 11 

Would the mea¬ 
sured CVP be low? 

3.4 (1-9.9) 

0.65(0.28-1.2) 


Abbreviations: Cl, confidence interval; CVP, central venous pressure; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 

“Data are homogenous, with P= .62 for LR+, but heterogeneous for LR- (P< .01). 
The populations of patients were different. One study was of patients undergoing an 
evaluation for cardiac transplantation for left ventricular systolic dysfunction. The other 
study assessed patients with acute dyspnea. 

“Data are homogenous, with P= .22 for LR+ and .31 for LR-. 


patients with advanced heart failure (ie, at least New York 
Heart Association class III), whereas 1 included critically 
ill patients in the intensive care unit. These data capture 
an important population for whom the finding would be 
of interest. We do not know how well the results apply to 
less severely ill patients treated in a primary care clinic. 
We agree with advocates who suggest that the clinical 
assessment of the JVP is useful, but we also agree with 
those who suggest that clinicians need more practice to 
become proficient. 

From these few studies, it is difficult to know whether cli¬ 
nicians, on balance, are better at identifying patients with a 
low vs a high CVP. However, our confidence in the accuracy 
of assessing a low CVP is only modest, given the broad confi¬ 
dence intervals. 


EVIDENCE FROM GUIDELINES 

The Scottish Intercollegiate Guidelines Network recom¬ 
mends using the JVP to help diagnose left ventricular sys¬ 
tolic function and to identify patients who need diuretics. 13 
The US Department of Veteran Affairs recommends 
assessing for jugular venous distention in hypertensive 
patients. 14 


CLINICAL SCENARIO—RESOLUTION 


For a variety of reasons such as self-confidence in CVP 
assessment or patient-specific anatomy such as large, 
thick necks, primary care providers often assume they will 
be unable to identify the JVP. The sense that you will not 
be able to visualize the veins may be accurate, although 
reinforced from using poor examining technique. We sug¬ 
gest that clinicians reassess their performance by making 
sure that they are using the proper examining technique. 
One study of heart failure patients suggests that the 
abdominojugular reflux evaluation has excellent repro¬ 
ducibility. It is possible that it is easier to see sustained 
inducible jugular venous distention than the normal 
venous pulse wave, especially in patients with large, thick 
necks. At the least, using abdominal pressure to help iden¬ 
tify the course of the internal jugular vein might improve 
technique and ability to identify normal venous pulse 
waves. 

A JVP more than 3 cm above the sternal notch, or a sus¬ 
tained JVP of 4 cm or more with abdominal compression, 
suggests a 3- to 4-fold increase in the likelihood that the 
CVP is elevated. 
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CENTRAL VENOUS PRESSURE— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Estimating the prior probability for an elevated CVP 
among patients with a low ejection fraction depends on 
the patient’s underlying condition and the effectiveness of 
treatment. Current treatment regimens that now include 
(3-blockers and angiotensin-converting enzyme inhibitors 
may decrease the prevalence of volume overload low in 
patients with a reduced ejection fraction. In the Studies of 
Left Ventricular Dysfunction, investigators determined 
clinically that approximately 10% of patients with a left 
ventricular ejection fraction 35% or less at baseline had 
an elevated CVP. 34 Although the CVP was not invasively 
measured, we know that clinical assessments of a high 
CVP typically underestimated the true value. As a starting 
point, the range 10% to 20% seems like a reasonable esti¬ 
mate for elevated CVP among patients previously diag¬ 
nosed as having a low ejection fraction. 

The close relationship between low CVP and underly¬ 
ing disease makes it impossible to come up with a gener¬ 
ally useful starting point for a pretest probability about 
CVP, so clinicians must use their own judgment individu¬ 
alized for the patient. 

POPULATION FOR WHOM AN ABNORMAL CENTRAL 
VENOUS PRESSURE SHOULD BE CONSIDERED 

• Patients with a low left ventricular ejection fraction are 
at risk for a high CVP or a low CVP (eg, overdiuresis). 

• Patients with underlying acute clinical conditions that 
lead to volume loss may have a low CVP. 


DETECTING THE LIKELIHOOD OF AN 
ABNORMAL CENTRAL VENOUS PRESSURE 

See Table 11-6. 


Table 11-6 Likelihood of Abnormal Central Venous Pressure 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Would the Measured CVP Be High? 

Abdominojugular reflux (n = 2) a 

4.4 (1.8-10) 

0.48(0.22-1.1) 

Would the Measured CVP Be High? 

Clinically assessed high CVP from the JVP b 

3.1 (1.6-6.0) 

0.50 (0.37-0.68) 

Would the Measured CVP Be Low? 

Clinically assessed low CVP from the JVP C 

3.4 (1-9.9) 

0.65(0.28-1.2) 


Abbreviations: Cl, confidence interval; CVP, central venous pressure; JVP, jugular venous 
pulse; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
a A positive test result is a sustained increase in the JVP of 4 cm or more with 10 sec¬ 
onds of abdominal compression, followed by an abrupt decrease with the release of 
pressure. 

b Use these values when the clinical question is, “Would my patient have a high CVP on 
invasive measurement?" A positive result suggesting a high CVP is a JVP more than 3 
cm above the sternal angle. 

'Use these values when the clinical question is, “Would my patient have a low CVP on 
invasive measurement?” When the meniscus of the JVP is not observed, the result is 
positive and suggests a low CVP. However, the Cl around these estimates is broad. 

REFERENCE STANDARD TESTS 

Invasive measurement of the CVP with an internal monitor. To 
minimize variation, zero the manometer to a line representing 
the intersection of a cross-sectional plane through the fourth 
intercostal space and a coronal plane between the back and 
xyphoid process. 5 
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EVIDENCE TO SUPPORT THE UPDATE: 

Central Venous Pressure 



TITLE Bedside Cardiovascular Examination in Patients 
With Severe Chronic Heart Failure: Importance of Rest or 
Inducible Jugular Venous Distention. 

AUTHORS Butman SM, Ewy GA, Standen JR, Kern KB, 
Hahn E. 

CITATION JAm Coll Cardiol. 1993;22(4):968-974. 

QUESTION Do a variety of clinical findings predict car¬ 
diac hemodynamics in a group of patients with advanced 
chronic congestive heart failure? 

DESIGN Prospective, convenience sample. Some of the 
patients (52%) had an examination by a second observer 
to assess precision. 

SETTING Cardiac catheterization laboratory, Tucson, 
Arizona. 

PATIENTS Fifty -two patients under evaluation for pos¬ 
sible heart transplantation and who were undergoing 
right-sided heart catheterization within 24 hours of the 
physical examination. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Abdominojugular reflux—a positive test result was defined as 
4 cm or more sustained elevation of the jugular venous pulse 
(JVP) with 10 seconds of abdominal compression that disap¬ 
peared abruptly with the release of abdominal pressure. 

JVP was considered abnormal and elevated if pulsations 
were seen while the patient was elevated at 45 degrees from 
horizontal, or if the estimated pressure was greater than 7 cm. 
The reference standard was the pulmonary capillary wedge 
pressure (>18 mm Hg was considered abnormal, indicating 
volume overload). 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and K for the physical examination 
findings. 


Table 11 -7 Likelihood Ratio for Jugular Venous Pressure and 
Abdominojugular Reflux for an Elevated Central Venous Pressure 


Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Jugular venous 
pressure 

0.57 

0.93 

8.5 (1.8-49) 

0.46 (0.60-0.69) 

Abdomino¬ 
jugular reflux 

0.81 

0.80 

4.0 (1.8-12) 

0.24(0.11-0.47) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


MAIN RESULTS 

See Table 11-7. The agreement for the presence of JVP eleva¬ 
tion was good (k = 0.69) but even better for abdominojugular 
reflux (k = 0.92). These patients were mostly men and had a 
low ejection fraction (mean ejection fraction, 18%; range, 
6%-39%). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Precision was determined. The clinicians 
judged the JVP as abnormal or not. With the patient at 45 
degrees from horizontal, the clinicians recorded an elevated 
central venous pressure (CVP) when they could visualize the 
jugular vein contours. 

LIMITATIONS The pulmonary capillary wedge pressure 
served as the reference standard rather than the CVP (the 
wedge pressure is a better indicator of volume status). Small 
sample size in a select group of patients. 

There must have been expectation bias in that most clini¬ 
cians would have expected these severely affected patients to 
have volume overload and abnormal physical findings. This 
should have led to an overestimate of sensitivity and an 
underestimate of specificity (because more patients would 
have been expected to be in the first row of the 2x2 table). 
The results are consistent with those found for estimated 
CVP greater than 10 cm by Stein et al 1 in a similar population 
of patients with advanced heart failure. An intriguing finding 
is that every patient judged to have an elevated JVP also had 


El 1-1 








CHAPTER 11 Evidence to Support the Update 


an abnormal abdominojugular reflux. The gain in sensitivity 
from the abdominojugular reflux assessment vs the JVP was 
offset by the loss of specificity; these tests performed simi¬ 
larly in this population. 

REFERENCE FOR THE EVIDENCE 

1. Stein JR, Neumann A, Marcus RH. Comparison of estimates of right 
atrial pressure by physical examination and echocardiography in 
patients with congestive heart failure and reasons for discrepancies. Am J 
Cardiol 1997;80(12):1615-1618. 
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TITLE How Far Is the Sternal Angle From the Mid-Right 
Atrium? 

AUTHORS Seth R, Magner P, Matzinger F, vanWalraven C. 

CITATION / Gen Intern Med. 2002;17( 11 ):861-865. 

QUESTION Is the recommendation to add 5 cm to the 
jugular venous pulse to estimate the central venous pres¬ 
sure valid? 

DESIGN Convenience sample of patients undergoing 
computed tomography of the chest. 

SETTING Imaging unit in a Canadian university hospital. 

PATIENTS One hundred sixty of 333 potentially eligible 
patients. Patients with chest deformities, large habitus 
prohibiting landmark identification, and refusals to par¬ 
ticipate were excluded. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

High-speed chest computed tomography (CT) scans were 
performed on patients while they were in the lateral supine 
and 90-degree positions during an end-inspiratory breath 
hold. The authors assumed that the mid-right atrium was 2 
cm below the superior vena cava and right atrial junction. 


range, 4.6-6.1 cm). However, when the patient was at 90 
degrees, the median distance was 8.3 cm (interquartile range, 
7-9.6 cm). Between 30- and 60-degree elevation (the eleva¬ 
tion typically used in clinical assessments), the median calcu¬ 
lated distance was approximately 8 cm. 

CONCLUSIONS 

LEVEL OF EVIDENCE Not a diagnostic test study. 

STRENGTHS Large sample, asking an important question 
about the assumptions necessary for the clinical examination. 

LIMITATIONS The authors had to make their own assump¬ 
tion about the position of the mid-right atrium. The CT 
scans were done in a population of patients primarily with 
lung or thoracic disease (eg, carcinoma). 

This is a clever and basic study to test an assumption under¬ 
lying the clinical examination. The decision to add 5 cm to the 
estimation of the jugular venous pressure (JVP) makes sense 
when the patient is supine. However, clinicians almost never 
assess the JVP in the supine patient. The authors found a 
median distance of 8 cm in positions typically used during the 
clinical examination. Thus, clinicians using the JVP would 
underestimate the central venous pressure (CVP) by 3 cm. The 
implication of this is that any patient with a JVP 3 cm above 
the horizontal should be considered as having a high CVP 
because the likely CVP will be more than 10 cm. This recom¬ 
mendation is consistent with data from the Stein et al 1 study 
that found the JVP leads to underestimates of around 5 cm for 
patients with elevated CVP. 

REFERENCE FOR THE EVIDENCE 

1. Stein JR, Neumann A, Marcus RH. Comparison of estimates of right 
atrial pressure by physical examination and echocardiography in 
patients with congestive heart failure and reasons for discrepancies. Am J 
Cardiol 1997;80( 12): 1615-1618. 
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MAIN OUTCOME MEASURE 

Measured sternal angle distance for supine and 90-degree 
positions. The investigators used geometric calculations to 
determine the distance between the sternal angle at 30 
degrees, 45 degrees, and 60 degrees. 


MAIN RESULTS 

With the patient supine, the median distance between the 
sternal angle and right atrium was 5.4 cm (interquartile 
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TITLE Comparison of Estimates of Right Atrial Pressure 
by Physical Examination and Echocardiography in 
Patients With Congestive Eleart Failure and Reasons for 
Discrepancies. 

AUTHORS Stein JH, Neumann A, Marcus RH. 

CITATION Am}Cardiol. 1997;80( 12): 1615-1618. 

QUESTION Among patients with severe heart conges¬ 
tive heart failure, how closely do cardiologists predict the 
central venous pressure? 

DESIGN Consecutive, prospective. 

SETTING Cardiac catheterization laboratory, Chicago, 
Illinois. 

PATIENTS Twenty -two patients with an average left 
ventricular ejection fraction of 19% (range, 12%-29%). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The central venous pressure (CVP) was measured from the 
jugular venous pressure (JVP) by identifying the peak JVP. A 
centimeter ruler placed vertically on the sternal angle was 
used to measure the distance at an intersection with a hori¬ 
zontal straight edge placed at the JVP. To estimate the CVP, 5 
cm was added to the vertical distance. A right-sided heart 
catheterization was performed immediately thereafter. 

MAIN OUTCOME MEASURE 

Correlation between clinical estimate of the CVP and the 
invasive measurement. The data are displayed in a scatter- 
plot, so that lines can be drawn to extract the raw results. 


Table 11-8 Likelihood Ratio of Clinically Estimated CVP 


CVP 

Invasive CVP 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical estimate 
of CVP >10 cm 

CVP >10 cm 

11 (0.73-157) 

0.25(0.10-0.64) 

Clinical estimate 
CVP > 8 cm 

CVP > 8 cm 

1.6 (0.98-3.7) 

0.18(0.03-1.1) 


Abbreviations: Cl, confidence interval; CVP, central venous pressure; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Objective reference standard done immedi¬ 
ately after the CVS was assessed. 

LIMITATIONS Small population of patients in a narrow 
spectrum of disease. The examiners were specialists. 

These data are most useful for validating the concept that 
clinical assessment of the CVP, by measuring the vertical dis¬ 
tance to the JVP, and then adding 5 cm, will systematically 
underestimate the true pressure. Because the population of 
patients was small, the confidence intervals around the esti¬ 
mates using a cut point of 10 are huge for the positive LR. 
Every patient with a clinical estimate of more than 10 cm 
CVP had the result confirmed by the invasive test. However, 
it seems likely that cardiologists estimating a high pressure 
(> 10 cm) in a population of patients with low ejection frac¬ 
tions are usually going to be correct. It becomes much more 
difficult to identify patients with volume overload when 
lower clinical and invasive thresholds are used. Reviews of 
clinical assessments of CVP recommend that clinicians use a 
clinical estimate of 8 cm as their threshold for assessing a 
high pressure. 

Reviewed by David L. Simel, MD, MHS 


MAIN RESULTS 

The correlation between the raw clinical estimate and the 
invasive measure was 0.92. The clinical estimates systemati¬ 
cally underestimated the actual value. The bias was least for 
those with clinical estimates of less than 8 (correlation was 
near perfect), but the underestimation became more pro¬ 
nounced as the clinician estimated a higher CVP from the 
JVP. With estimates of 9 to 14 cm, the clinicians underesti¬ 
mated the true CVP by 5.0 cm. Dichotomizing the data for a 
clinical estimate of 8 cm or more and extracting the results 
from the scatterplot reveals the likelihood ratios (LRs) in 
Table 11-8. 
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CHAPTER 


CLINICAL SCENARIO 


Does This Patient Have 

Acute Cholecystitis? 

Robert L. Trowbridge, MD 
Nicole K. Rutkowski, MD 
Kaveh G. Shojania, MD 


A 72-year-old woman with poorly controlled diabetes, 
coronary artery disease, and hypertension presents to the 
emergency department complaining of nausea and vomit¬ 
ing. As an emergency department resident, you elicit the 
history that the patient felt well until 24 hours ago, when 
she developed anorexia, followed rapidly by bilious eme¬ 
sis. She describes mild upper abdominal discomfort but is 
unable to further localize the pain. There have been no 
abnormal bowel movements, gastrointestinal bleeding, or 
chest pain. 

The patient is febrile (39°C) and appears uncomfort¬ 
able. Her lungs are clear, and cardiac examination reveals 
only a fourth heart sound. There is moderate epigastric 
tenderness and guarding throughout the abdomen, but 
no rigidity. Pelvic and rectal examination results are unre¬ 
markable. Electrocardiography shows no changes sugges¬ 
tive of ischemia. Laboratory testing shows a leukocytosis 
level of 17500 x 10 3 /|4L, serum transaminase levels twice 
the upper limit of normal, and a total bilirubin level of 3.2 
mg/dL. In considering the differential diagnosis for the 
patient’s presenting complaint and laboratory results, you 
wonder whether the suspicion of acute cholecystitis is 
high enough to warrant further testing. 


WHY IS THIS QUESTION IMPORTANT? 


Acute cholecystitis accounts for 3% to 9% of hospital 
admissions for acute abdominal pain. 14 Most patients pre¬ 
senting with upper abdominal complaints are subse¬ 
quently found to have a relatively benign cause of pain 
(eg, dyspepsia or gastroenteritis), 2,5 but the possibility of 
acute cholecystitis mandates the completion of a compre¬ 
hensive and at times laborious diagnostic evaluation. The 
importance of this clinical dilemma is only magnified by 
the frequency with which abdominal pain is encountered 
in clinical practice. 6 ' 8 

Traditionally, the diagnosis of acute cholecystitis was fol¬ 
lowed by a several-week “cooling off” period before pro¬ 
ceeding to surgery. Most clinicians now advocate early 
cholecystectomy (ie, within several days of the onset of 
symptoms), 9 because it leads to lower complication rates, 
reduced costs, and shortened recovery periods. 10 ' 14 

DEFINITION OF CHOLECYSTITIS 

Defining cholecystitis as “inflammation of the gallbladder” 
implies a pathologic state. What clinicians usually mean by 
acute cholecystitis, however, is the presence of this patho¬ 
logic state (seen macroscopically at laparotomy or micro¬ 
scopically by the pathologist) in the setting of a plausibly 
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related clinical presentation. Practically speaking, chole¬ 
cystitis is a syndrome encompassing a continuum of clini- 
copathologic states. At one end of this continuum is 
symptomatic cholelithiasis, with acute attacks of pain (bil¬ 
iary colic) that resolve in 4 to 6 hours. At the other end, 
that which is typically associated with the term acute chole¬ 
cystitis , is a clinical picture in which biliary colic is longer 
lasting and accompanied by fever, laboratory markers of 
inflammation, or cholestasis. 15,16 Gallbladder inflamma¬ 
tion without gallstones (ie, acalculous cholecystitis) typi¬ 
cally occurs in critically ill patients and is consequently 
associated with a high mortality rate. 17,18 


HOW TO ELICIT THE RELEVANT 
SIGNS AND SYMPTOMS 

Cope’s Early Diagnosis of the Acute Abdomen 15 points out that 
“biliary colic” is a misnomer because biliary obstruction pro¬ 
duces pain of a steady, nonparoxysmal nature. A majority of 
studies have explicitly defined biliary colic in similar terms 
(eg, a steady right upper quadrant pain lasting for at least 30 
minutes), but others have used the term without definition. 19 
Cope’s 15 also stresses that biliary colic localizes to the midepi¬ 
gastrium as often as to the right upper quadrant. A recent 
systematic review 19 supports this observation because “upper 
abdominal pain” exhibited test characteristics comparable to 
right upper quadrant pain. Thus, the clinician should inquire 
about both pain in the upper quadrant and more generally 
pain in the upper abdomen. The clinician should also ask the 
patient about fat intolerance because abdominal discomfort 
after fatty meals may have a predictive value similar to that of 
biliary colic. 19 

Physical findings most famously associated with the gall¬ 
bladder are the Courvoisier and Murphy signs. The Cour- 
voisier sign has evolved in meaning, 20 but standard definitions 
describe the sign as referring to a palpable, nontender gall¬ 
bladder in a patient with jaundice. 21,22 Courvoisier observed 
that dilation of the gallbladder occurred more commonly 
when obstruction resulted from malignancy, rather than 
from benign conditions such as gallstones. Although this 
association is real, the sign should not be elevated to the sta¬ 
tus of a “law,” 20 ' 22 because recent reports confirm the occur¬ 
rence of the Courvoisier sign in biliary conditions other than 
obstructive malignancies. 23 

The Murphy sign refers to pain and arrested inspiration 
occurring when the patient inspires deeply while the examiner’s 
fingers are hooked underneath the right costal margin. 21,22,24 
Data addressing the usefulness of the Murphy sign in evaluating 
patients suspected of having acute cholecystitis are discussed 
along with other findings from the systematic review presented 
below. The only other physical sign we identified as specifically 
associated with acute cholecystitis was the Boas sign. Originally, 
this sign referred to point tenderness in the region to the right of 
the 10th to 12th thoracic vertebrae, 25 ' 27 but contemporary 
sources describe hyperesthesia to light touch in the right upper 
quadrant or infrascapular area. 22 One study 28 reported that 7% 
of patients undergoing cholecystectomy exhibited hyperesthesia 


in this region, but no patient exhibited the Boas sign in the orig¬ 
inal sense. None of the other studies reviewed below assessed the 
Boas sign in either form. 

ACCURACY OF DIAGNOSTIC IMAGING 

Ultrasonography of the right upper quadrant has emerged as 
the most commonly used imaging modality for suspected 
cholecystitis. Meta-analysis of the diagnostic performance of 
ultrasonography in detecting acute cholecystitis indicated an 
unadjusted sensitivity and specificity of 94% and 78%, 
respectively. 29 The investigators included in their analysis 
adjustments for verification bias 30 ' 32 (also called workup 
bias 33 ), which refers to the distorted diagnostic test character¬ 
istics observed when the decision to proceed with a gold 
standard test (eg, cholecystectomy) is affected by the results 
of preliminary tests such as right upper quadrant ultrasonog¬ 
raphy. Patients with a negative ultrasonography result will 
undergo cholecystectomy only in the setting of extremely 
typical clinical findings. The consequent loss of patients with 
atypical clinical presentations reduces the opportunity for 
false-negative ultrasonography results, thus inflating the 
apparent sensitivity of ultrasonography and its associated 
“rule-out” power. Conversely, specificity and the associated 
“rule in” ability of ultrasonography are underestimated. 

Adjustments for the effects of verification bias in the above- 
mentioned meta-analysis 29 indicated that ultrasonography 
detects acute cholecystitis with sensitivity of 88% (95% confi¬ 
dence interval [Cl], 74%-100%) and specificity of 80% (95% 
Cl, 62%-98%). Sensitivity for the detection of cholelithiasis 
was comparable, but specificity was higher, at approximately 
99%. Radionuclide scanning has slightly better test character¬ 
istics for the diagnosis of acute cholecystitis but offers no eval¬ 
uation of alternative abdominal diagnoses and has the 
disadvantages of greater inconvenience and patient exposure 
to radionuclides. 29 Computed tomography of the abdomen, 
although useful for the evaluation of suspected complications 
and concurrent intra-abdominal conditions, is inferior to 
ultrasonography in the assessment of acute biliary disease. 34,35 

METHODS 

The initial electronic search queried the MEDLINE database 
for January 1966 through November 2000 (limited to 
English-language articles) using the Medical Subject Head¬ 
ings (MeSH) “acute abdomen,” “abdominal pain,” “cholecys¬ 
titis,” “cholelithiasis,” “gallbladder,” and “gallbladder diseases.” 
These terms were then combined with various combinations 
of MeSH terms, title words, and text words: “physical exami¬ 
nation,” “medical history taking,” “professional competence,” 
“sensitivity and specificity,” “reproducibility of results,” 
“observer variation,” “diagnostic tests,” “decision support 
techniques,” “Bayes theorem,” “predictive value of tests,” 
“palpation,” “percussion,” “differential diagnosis,” and “diag¬ 
nostic errors.” The Science Citation Index and Cochrane 
Library were also searched, and a hand search of Index Medi- 
cus was conducted for 1950 through 1965, using the terms 
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“cholecystitis,” “acute abdomen,” and “gallbladder.” Bibliog¬ 
raphies of identified articles were searched for additional per¬ 
tinent articles, as were the bibliographies of prominent 
textbooks of physical examination, surgery, and gastroenter¬ 
ology. An electronic search of MEDLINE was repeated in July 
2002 to look for any relevant articles appearing since comple¬ 
tion of the more comprehensive search. 

Two authors (RT and NR) independently abstracted data 
from the identified studies, and all 3 authors reviewed these 
data for inclusion. Included studies evaluated the role of a clin¬ 
ical test (including medical history, physical examination, and 
basic laboratory tests) in adult patients with abdominal pain 
or suspected acute cholecystitis. Included studies were also 
required to report data from a control group of patients subse¬ 
quently found not to have acute cholecystitis, with sufficient 
detail to allow construction of 2 x 2 tables. Finally, studies were 
required to define cholecystitis according to an adequate gold 
standard, including surgery, pathologic examination, radio- 
graphic imaging (hepatic iminodiacetic acid [HIDA] scan or 


right upper quadrant ultrasonography), or clinical follow-up 
documenting a course consistent with acute cholecystitis and 
without evidence for an alternate diagnosis. 

Summary measures for the sensitivity of the evaluated com¬ 
ponents of the clinical examination and basic laboratory tests 
for cholecystitis were derived from published raw data from the 
reported studies meeting our inclusion criteria. A random- 
effects model was used to generate conservative summary mea¬ 
sures and CIs for the sensitivity and likelihood ratios (LRs). 36 ' 38 
For LRs, a summary measure is reported only when more than 2 
studies were identified; otherwise, a range was reported. 


RESULTS 

Of 195 studies identified by our search, 17 evaluated the 
role of the clinical examination or basic laboratory test in 
patients with acute abdominal pain and possible acute 
cholecystitis and also met our inclusion criteria (Table 12-1). 39 55 


Table 12-1 Studies of the Diagnostic Performance of Clinical and Laboratory Findings in Detecting Acute Cholecystitis 


Source 

Study Period 

Selection Criteria 

Design 

Sample 

Size 

Consecutive 

Patients 

Basis for Diagnosis 

Adedeji and McAdam, 39 1996 

1985-1990 

Acute abdominal pain and age > 70 y 

Retrospective 

431 

Yes 

Clinical follow-up 

Bednarz et al, 40 1986 

1983-1984 

Suspected acute cholecystitis and 
referred for HIDA scan 

Prospective 

70 

Yes 

Surgery (43%) 

Clinical impression (57%) 

Brewer et al, 41 1976 

1971-1972 

Abdominal pain 

Retrospective 

570 

Yes 

Multiple 

Dunlop et al, 42 1989 

1982-1986 

Acute abdominal pain and suspected 
acute cholecystitis 

Prospective 

270 

Yes 

Pathology (71%) 

Clinical impression (29%) 

Eikman et al, 43 1975 

Not stated 

Suspected acute cholecystitis and 
referred for radiology testing 

Prospective 

38 

Yes 

Surgical (38%) 

Clinical impression (62%) 

Gruber et al, 44 1996 

1990-1993 

Positive HIDA scan results and 
underwent surgery for suspected 
acute cholecystitis 

Retrospective 

198 

Yes 

Pathology 

Halasz, 45 1975 

1969-1974 

Suspected acute cholecystitis 

Retrospective 

238 

Yes 

Surgery (65%) 

Other (35%)“ 

Johnson and Cooper, 46 1995 

Not stated 

Positive HIDA scan results and 
underwent surgery for suspected 
acute cholecystitis 

Retrospective 

69 

No 

Pathology 

Juvonen et al, 47 1992 

1988-1989 

Suspected acute cholecystitis 
referred for ultrasonography 

Prospective 

129 

Yes 

Pathology (95%) 
Ultrasonography (5%) 

Liddington and Thomson, 48 
1991 

Not stated 

Abdominal pain 

Prospective 

142 

No 

Clinical impression 

Lindenauer and Child, 49 1966 

1959-1964 

Underwent cholecystectomy 

Retrospective 

200 

No 

Pathology 

Potts and Vukov,” 1999 

1992-1995 

Abdominal pain requiring operation 
and age > 80 y 

Retrospective 

117 

Yes 

Pathology 

Prevot et al, 51 1999 

1997-1999 

ICU patients with suspected acute 
acalculous cholecystitis 

Prospective 

32 

Yes 

Pathology (50%) 

Clinical impression (50%) 

Raine and Gunn, 52 1975 

1965-1973 

Suspected acute cholecystitis and 
underwent surgery 

Prospective 

156 

Yes 

Pathology 

Schofield et al, 53 1986 

Not stated 

Abdominal pain and suspected acute 
cholecystitis 

Prospective 

100 

Yes 

Gallstones at laparotomy 

Singer et al, 54 1996 

1993 

Suspected acute cholecystitis and 
radiology testing completed 

Retrospective 

100 

Yes 

Pathology (44%) 

HIDA scintigraphy (56%) 

Staniland et al, 55 1972 

Not stated 

Admission for abdominal pain of < 1 wk 

Retrospective 

600 

No 

Surgery 


Abbreviations: HIDA, hepatic iminodiacetic acid; ICU, intensive care unit. 
“Radiology testing and clinical follow-up; exact proportions not specified. 
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Twelve of these studies 40,42 ' 47,49,51 ' 54 enrolled patients specifi¬ 
cally suspected of having acute cholecystitis, with inclu¬ 
sion of many of these studies based on patient referral for 
radiology testing (ie, HIDA scan or right upper quadrant 
ultrasonography) for the confirmation of a clinical diag¬ 
nosis. The remaining 5 studies 39,41,48,50,55 enrolled patients 
presenting with abdominal pain and did not require a spe¬ 
cific suspicion of acute cholecystitis for patient inclusion. 
Each of the 17 studies evaluated a variable number of clin¬ 
ical and laboratory findings included in the evaluation of 
suspected cholecystitis, ranging from 1 to 9 characteristics 
per study (Table 12-2). 

Precision of Signs and Symptoms 

Measurements of laboratory characteristics and objective 
clinical signs such as temperature are assumed to have 


high precision, but the reproducibility of other aspects of 
the clinical examination for cholecystitis remains largely 
unknown. In fact, the only study identified as assessing 
the precision of some aspect of the clinical examination 
for biliary disease was an evaluation of the diagnostic 
value of iridology 56 (iridologists believe that intricate 
neural connections between major organs and the iris 
permit diagnosis of general medical conditions through 
inspection of iris pigmentation patterns 57,58 ). In this rela¬ 
tively well-designed study, the accuracy and precision of 
iridologic signs for the diagnosis of cholecystitis were 
barely distinguishable from values expected by chance 
alone (k = -0.06 to 0.28 for the 10 possible observer 
pairs). 

Unfortunately, analogous studies have not been carried 
out with conventional clinical maneuvers related to the 


Table 12-2 Summary Test Characteristics for Clinical and Laboratory Findings in Included Studies 




No of 



Summary LR b 

Finding (No. of Studies) 

Patients 4 

Sensitivity (95% Cl) 

Specificity (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical 

Anorexia (2) 41,55 

1135 

0.65 (0.57-0.73) 

0.50 (0.49-0.51) 

1.1-1.7 

0.5-0.9 

Emesis (4) 41,46,53,55 

1338 

0.71 (0.65-0.76) 

0.53 (0.52-0.55) 

1.5(1.1-2.1) 

0.6 (0.3-0.9) 

Fever (>35°C) (8) 40,41,44,46,50 ' 53 

1292 

0.35(0.31-0.38) 

0.80 (0.78-0.82) 

1.5 (1.0-2.3) 

0.9 (0.8-1.0) 

Guarding (2) 41,65 

1170 

0.45 (0.37-0.54) 

0.70 (0.69-0.71) 

1.1-2.8 

0.5-1.0 

Murphy sign (3) 39,46 ' 54 

565 

0.65 (0.58-0.71) 

0.87 (0.85-0.89) 

2.8 (0.8-8.6) 

0.5 (0.2-1.0) 

Nausea (2) 46,54 

669 

0.77 (0.69-0.83) 

0.36 (0.34-0.38) 

1.0-1.2 

0.6-1.0 

Rebound (4) 40,41,48,55 

1381 

0.30 (0.23-0.37) 

0.68 (0.67-0.69) 

1.0 (0.6-1.7) 

1.0 (0.8-1.4) 

Rectal tenderness (2) 41,55 

1170 

0.08(0.04-0.14) 

0.82(0.81-0.83) 

0.3-0.7 

1.0-1.3 

Rigidity (2) 41,55 

1140 

0.11 (0.06-0.18) 

0.87 (0.86-0.87) 

0.50-2.32 

1.0-1.2 

Right upper abdominal quadrant 

Mass (4) 40,45,53,54 

408 

0.21 (0.18-0.23) 

0.80 (0.75-0.85) 

0.8 (0.5-1.2) 

1.0 (0.9-1.1) 

Pain ( 5 ) 40,45,46,54,55 

949 

0.81 (0.78-0.85) 

0.67 (0.65-0.69) 

1.5 (0.9-2.5) 

0.7 (0.3-1.6) 

Tenderness (4) 40,45,54,55 

1001 

0.77 (0.73-0.81) 

0.54 (0.52-0.56) 

1.6 (1.0-2.5) 

0.4 (0.2-1.1) 

Laboratory 

Alkaline phosphatase 

> 120 U/L (4) 42,46,49,51 

556 

0.45(0.41-0.49) 

0.52 (0.47-0.57) 

0.8 (0.4-1.6) 

1.1 (0.6-2.0) 

Elevated ALT or 

AST level 4 (5) 42,46,49,51,53 

592 

0.38 (0.35-0.42) 

0.62 (0.57-0.67) 

1.0 (0.5-2.0) 

1.0 (0.8-1.4) 

Total bilirubin 
> 2 mg/dL (6) 46,42,43,46,49,51 

674 

0.45(0.41-0.49) 

0.63 (0.59-0.66) 

1.3 (0.7-2.3) 

0.9(07-1.2) 

Total bilirubin, AST, or alkaline 
phosphatase (I) 52 

270 





All 3 elevated 


0.34 (0.30-0.36) 

0.80 (0.69-0.88) 

1.6(1.0-2.8) 

0.8 (0.8-0.9) 

Any 1 elevated 


0.70 (0.67-0.73) 

0.42 (0.31-0.53) 

1.2 (1.0-1.5) 

0.7 (0.6-0.9) 

Leukocytosis 4 (7) 41,44,46,50 ' 53 

1197 

0.63 (0.60-0.67) 

0.57 (0.54-0.59) 

1.5 (1.2-1.9) 

0.6 (0.5-1.8) 

Leukocytosis 4 and fever ^j 44,52 

351 

0.24(0.21-0.26) 

0.85(0.76-0.91) 

1.6 (0.9-2.8) 

0.9 (0.8-1.0) 


Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“May not equal sums of N in Table 12-1 because not all studies applied all tests to all patients. 

“Summary measures provided only for findings discussed by more than 2 studies. 

“Greater than upper limit of normal (ALT, 40 U/L; AST, 48 U/L). 

“White blood cell count of more than 10/uL. 
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diagnosis of cholecystitis. In fact, as observed in a previous 
article in this series, 59 the precision of even the most basic 
components of the abdominal examination (eg, guarding, 
rigidity, and rebound tenderness) remains uncharacterized. 
Poor reproducibility for abdominal examination would 
erode the assessments of sensitivity and specificity provided 
by different investigators. Presumably, then, one can infer a 
certain degree of interrater reliability from the fact that 
multiple studies demonstrate modest sensitivity for these 
signs in diagnosing important abdominal conditions. 59 
Nonetheless, further assessments of core components of the 
abdominal examination would be a welcome addition to 
the literature. 

Accuracy of Signs and Symptoms 

No single clinical or laboratory finding had an LR- suffi¬ 
ciently low to rule out the diagnosis of acute cholecystitis 
(Table 12-2). Even the absence of right upper quadrant ten¬ 
derness does not rule out acute cholecystitis with its LR of 
0.4. Elderly patients may be particularly prone to present 
without signs or symptoms referable to the right upper 
quadrant. 60 

Similarly, individual symptoms, signs, and laboratory 
results did not have LR+s sufficiently high to rule in the diag¬ 
nosis of acute cholecystitis. In fact, none of the LR+s were 
more than 2.0, with the exception of the Murphy sign, which 
was associated with a ratio of 2.8. The 95% Cl for this sum¬ 
mary estimate included 1.0, but the use of the Murphy sign 
was especially prone to verification bias. Thus, the true LR+ 
might exceed the estimated value. 

Limitations of the Literature 

The problem of verification (or workup) bias 30 ' 33 was dis¬ 
cussed in the section on diagnostic imaging but likely 
affected all of the clinical and laboratory findings assessed 
in this review. Patients with upper abdominal tenderness, 
fever, abnormal liver function results, or other “typical” 
findings more commonly undergo further evaluation (eg, 
diagnostic imaging) for acute cholecystitis than do patients 
presenting without these findings. The lack of patients 
with atypical presentations in studies leads to overesti¬ 
mates of sensitivity and underestimates of specificity. Sup¬ 
plementing the diagnosis of cholecystitis with clinical 
follow-up would mitigate the effects of verification bias, 
but only 1 study 39 incorporated clinical follow-up in the 
diagnostic protocol. 

Spectrum bias 61 (or, more recently, spectrum effect 62 ) dis¬ 
torts test characteristics since there is inadequate represen¬ 
tation of the relevant disease and disease-free states in the 
patient samples used to evaluate the test of interest. The 
prevalence of cholecystitis in the study populations was as 
high as 80% and averaged 41%, in contrast to the preva¬ 
lence of 3% to 5% among patients presenting with abdomi¬ 
nal pain of less than 1 week’s duration. 1,2,41 

Subgroup analysis can generate values for sensitivity and 
specificity in patient populations with substantially differ¬ 


ent previous likelihoods of disease from the average value. 62 
Because available data often do not permit such analysis, 
one has to make qualitative inferences about the difference 
between the prior probability of disease in a particular 
patient and the prevalence in the population used to evalu¬ 
ate the test. For instance, a high prevalence of cholecystitis 
in clinical reports reduces the opportunity to detect both 
false-positive and true-negative results compared to the 
findings in patients with a lower prevalence of disease. 
Thus, clinical findings and laboratory tests used to evaluate 
cholecystitis may have different sensitivity and specificity 
than suggested in the available literature. 

Other limitations to the existing literature include the 
retrospective design of most studies, modest sample sizes, 
unblinded assessment of key outcomes and test results, and 
the variability in criteria for establishing a diagnosis of 
cholecystitis. The included studies varied between accept¬ 
ing clinicians’ diagnostic impressions (usually incorporat¬ 
ing imaging results), findings at laparotomy, and pathologic 
findings as the means of diagnosis. Unfortunately, the cor¬ 
relation between clinical and pathologic diagnoses of chole¬ 
cystitis is poor. 63 Gallstones occur commonly enough that 
their presence, even in the context of inflammatory cells, 
may be “true but unrelated” with respect to the patient’s 
acute presentation. Overdiagnosis from this and other 
available gold standards likely resulted in an overestimation 
of the prevalence of acute cholecystitis, with consequent 
distortion of the usefulness of clinical and basic laboratory 
findings. Finally, studies assessing both calculous and acal- 
culous cholecystitis were included in the review. Although 
these entities share many clinical traits, the nonspecific pre¬ 
sentation of acalculous cholecystitis likely eroded the value 
of several clinical findings. 

Combinations of Findings and the Clinical “Gestalt” 

Even with the above limitations, it seems unlikely that 
individual clinical or laboratory findings have LR+ or LR- 
of sufficient magnitude to play a decisive role in the diag¬ 
nosis of acute cholecystitis. Thus, one might look to com¬ 
binations of clinical signs and symptoms to facilitate, confirm, 
or exclude the diagnosis of cholecystitis. Unfortunately, 
only 3 included studies 42,44,52 specifically evaluated the value 
of such combinations. Two studies evaluated the combi¬ 
nation of fever and leukocytosis; the third reviewed vari¬ 
ous combinations of liver function tests. Assessments of 
the LRs of the above combinations demonstrated no bene¬ 
fit over their individual components, suggesting that these 
tests did not function independently of one another. 
Indeed, fever and leukocytosis may be seen as different 
manifestations of the same underlying process of nonspe¬ 
cific inflammation, so it is not surprising that combining 
them provided no synergistic diagnostic value. Similarly, 
right upper quadrant pain and the Murphy sign likely 
reflect the same underlying pathophysiologic process (ie, 
local inflammation and peritoneal irritation), so that 
these findings would not be expected to function indepen¬ 
dently of one another. 
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Although the existing literature does not identify specific 
clinically useful combinations of findings, the effect of such 
combinations can be estimated with available data. In 2 
randomized trials of early vs delayed cholecystectomy, 13 ’ 14 
laparotomy failed to confirm the preoperative diagnosis of 
acute cholecystitis in 5 of 99 patients (95% Cl, 1.9-12) 14 and 
in 0 of 104 patients (95% Cl, 0-4.4). 13 Given a likely bias 
toward confirming the preoperative diagnosis, let us 
assume that the actual false-positive rate for the clinical 
diagnosis of cholecystitis is higher (eg, 15%) than suggested 
by these values. 

A 15% false-positive rate would imply an 85% posttest 
probability for all clinical, laboratory, and radiologic tests. 
We know that ultrasonography of the right upper quadrant 
has a sensitivity and specificity of 88% and 80%, respec¬ 
tively. 29 Working backward, we can infer that the composite 
clinical evaluation generates a pretest probability of approxi¬ 
mately 60% before the results of ultrasonography are 
obtained. This posttest probability of 60% for the clinical 
suspicion of cholecystitis reflects the diagnostic power of the 
clinical evaluation before ultrasonography, as well as the pre¬ 
test probability. At this stage in the diagnostic process, the 
pretest probability reflects the prevalence of the diagnosis, 
which is approximately 5% among patients presenting to the 
emergency department with abdominal pain. 1 ' 2,41 Thus, the 
clinical diagnosis of acute cholecystitis formulated according 
to medical history, physical examination, and basic labora¬ 
tory testing must increase the pretest probability from 5% 
to 60%. 

Achieving this increase in pretest probability requires that 
the gestalt comprising certain clinical and laboratory find¬ 
ings have an LR+ on the order of 25 to 30. To put this range 
in perspective, “typical angina” has an LR+ of 115 for the 
diagnosis of coronary artery stenosis greater than 75% in 
adult men. Nonsloping depression of the ST segment of at 
least 2.5 mm during exercise electrocardiography has an LR+ 
of 39 for the same diagnosis. 64 Thus, our estimate for the 
diagnostic usefulness of the clinical gestalt in diagnosing 
acute cholecystitis, approximate and speculative as it is, con¬ 
firms the impression of many clinicians that the overall clini¬ 
cal assessment plays a crucial role in arriving at a diagnosis. 

It is tempting to supplement the existing literature by asking 
experts for their opinion on which specific findings drive the 
clinical impression for or against acute cholecystitis. Unfortu¬ 
nately, discerning the key elements of the clinical assessment 
can prove deceptive, even for experienced clinicians. For in¬ 
stance, a recent clinical model for the prediction of pulmonary 
embolism omits hypoxemia and pleurisy from the algorithm 
for determining pretest probability. 65 ’ 66 Similarly, many of the 
classic descriptors of angina have surprisingly little influence 
on the assessment of chest pain. 67 This dissociation between 
commonly accepted harbingers of disease and evidence-based 
determinants of disease probability undermines the role of ex¬ 
pert opinion in identifying key clinical findings even for com¬ 
mon conditions. Consequently, tempting as it is to open the 
“black box” of the clinical gestalt for cholecystitis, doing so will 
require further study of specific clinical findings or, more 
likely, combinations of findings. 


CLINICAL SCENARIO—RESOLUTION 


Your differential diagnosis for the patient’s presentation 
includes viral hepatitis, cholecystitis, and gallstone pan¬ 
creatitis. To validate your impression and help establish 
the relative likelihood of each, you ask the emergency 
department attending physician to evaluate the patient. 
She regards the likelihood of cholecystitis as high enough 
to warrant diagnostic imaging. In fact, her clinical impres¬ 
sion is that cholecystitis is the leading diagnosis, so she 
recommends urgent right upper quadrant ultrasonogra¬ 
phy. The ultrasonography subsequently reveals the pres¬ 
ence of gallstones, gallbladder wall thickening, and a 
sonographic Murphy sign. These findings, in the context 
of the patient’s presentation, virtually confirm the diagno¬ 
sis of acute cholecystitis. 68 


THE BOTTOM LINE 

The existing literature identifies no single finding with suffi¬ 
cient diagnostic power to establish or exclude acute cholecys¬ 
titis without further testing (eg, right upper quadrant 
ultrasonography). Combinations of certain symptoms, signs, 
and laboratory results likely have more useful LRs and pre¬ 
sumably inform the diagnostic impressions of experienced 
clinicians. Future research may allow the development of 
prediction rules that combine basic demographics with clini¬ 
cal findings to distinguish patients who require no further 
testing from those who require continued diagnostic evalua¬ 
tion, as is currently possible with the evaluation of suspected 
pulmonary embolism. 66,69 Until then, the clinical evaluation 
of patients with abdominal pain suggestive of cholecystitis 
will continue to rely heavily on the clinical gestalt and diag¬ 
nostic imaging. 
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Prepared by Robert L. Trowbridge, MD, and Kaveh G. Shojania, MD 

Reviewed by Amy Rosenthal, MD 


CLINICAL SCENARIO 


A 52-year-old man presents to the emergency department 
with a 6-hour history of dull, epigastric discomfort. The 
pain began several hours after lunch and did not intensify 
with eating dinner, although his appetite was poor. He is 
mildly nauseated and feels warm but denies emesis, diar¬ 
rhea, dyspnea, chest pain, fever, chills, or previous epi¬ 
sodes of a similar pain. The pain neither radiates nor 
changes with his position. 

He has hypertension and gout and drinks 2 oz of whiskey 
per day. There is no history of gastrointestinal illness or 
abdominal surgery. 

He appears moderately uncomfortable, but he is afebrile, 
with normal vital signs. The bowel sounds are decreased but 
present. Although he allows you to palpate his abdomen, you 
elicit tenderness in the epigastrium without rebound. The 
Murphy sign is negative. The liver and gallbladder are not 
palpable, a rectal examination causes no pain, and the stool is 
negative for occult blood. 

On laboratory examination, the white blood cell count is 
12700/ja.L (the automated differential shows increased neu¬ 
trophil levels). He has normal electrolyte, transaminase, 
bilirubin, amylase, and lipase levels and normal renal func¬ 
tion. The alkaline phosphatase level is slightly elevated, at 155 
U/L. An electrocardiogram reveals no evidence of ischemia, 
and the troponin I level is normal. 

The emergency physician regards acute cholecystitis as the 
leading diagnosis. Six months ago, the physician took a 2- 
day course in ultrasonography and has since performed 40 
bedside ultrasonographic tests on patients with abdominal 
pain, the first 10 of which were proctored by a radiologist to 
assess competency. The emergency physician performs a 
focused right upper quadrant ultrasonography in the 
present patient. He finds no evidence of gallstones, and the 
point of maximal tenderness does not localize to the gall¬ 
bladder (ie, there is no sonographic Murphy sign). He rec¬ 
ommends that the patient be discharged home, with follow¬ 
up in your clinic. However, the patient is still uncomfortable, 
and you have not established a diagnosis. Have you effec¬ 
tively ruled out acute cholecystitis? 


UPDATED SUMMARY ON ACUTE CHOLECYSTITIS 

Original Review 

Trowbridge RL, Rutkowski NK, Shojania KG. Does this 
patient have acute cholecystitis? JAMA. 2003;289(l):80-86. 

UPDATED LITERATURE SEARCH 

We repeated the original search strategy that targeted any 
study involving diagnosis, physical examination, sensitivity 
and specificity, reproducibility of results, decision support 
techniques, and other relevant methodologic terms, with any 
of the following text or keywords: “gallbladder,” “gall stones,” 
or “cholecystitis.” The updated PubMed search included the 
years 1998 through September 2004, and we included a more 
robust search for systematic reviews according to a published 
strategy. 1 This search yielded 337 articles published since 
November 11, 2001. An independent search of the OVID 
database with slight differences in the methodologic terms 
identified an additional 34 English-language studies pub¬ 
lished from 2002 to September 2004. 

NEW FINDINGS 

• The clinician’s gestalt is the most important piece of evidence 
from the clinical evaluation. The single findings with the high¬ 
est diagnostic value remain Murphy sign (positive likelihood 
ratio, 2.8) and right upper quadrant tenderness (negative like¬ 
lihood ratio, 0.4), although the confidence intervals (CIs) for 
both values cross 1, as documented in the original review. 

• Bedside ultrasonography performed by physicians with 
brief formal training courses may be useful when the result 
is the combined absence of a sonographic Murphy sign and 
any evidence of gallstones. Additional studies of bedside 
ultrasonography by nonradiologists are required. 

Details of the Update 

The focus of this review remains acute calculous cholecystitis. 
Studies focusing predominantly on acalculous cholecystitis 
were excluded. 



CHAPTER 12 Update 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

No new data were found that modify the original results, 
although we added data on bedside ultrasonography per¬ 
formed by nonradiologists. 

CHANGES IN THE REFERENCE STANDARD 

Surgical findings combined with pathology or clinical follow¬ 
up in patients who do not undergo surgery remain the refer¬ 
ence standard for acute cholecystitis. 

RESULTS OF LITERATURE REVIEW 

Patients reproducibly report biliary symptoms when questioned 
again 2 weeks after an initial assessment, using an extensive 
questionnaire addressing the details of their symptoms across 
various domains—pain, association with eating, changes in 
bowel habits, and fever, among others. 2 Physicians concurred 
with patients’ self-reported symptoms to a substantial extent (k 
scores > 0.6 and much higher in several cases). Two exceptions 
were history of fever and radiation to the right shoulder. For 
these findings, physicians concurred with only moderate agree¬ 
ment (k = 0.52 and K = 0.46, respectively). 

The bedside ultrasonography examination performed by a 
nonradiologist is an emerging approach to cholecystitis diag¬ 
nosis. 3,4 We had not previously included this test, so we con¬ 
ducted a supplemental search for additional articles addressing 
the utility of bedside ultrasonography. With no date restric¬ 
tion, we found 6 studies, although only 1 study 3 was of suffi¬ 
ciently high quality to warrant abstraction and inclusion in the 
update (Table 12-3). All studies used nonconsecutive conve¬ 
nience samples, but the 5 additional studies excluded from the 
update also had bias because of nonindependence of reference 
standard (ie, the decision to undergo confirmatory testing 
explicitly depended on the results of bedside ultrasonogra¬ 
phy). 4-8 In addition, these studies did not attempt to diagnose 


Table 12-3 Bedside Ultrasonographic 

Findings for Acute Cholecystitis 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Bedside ultrasonography evidence of 

2.7(17-4.1) 

0.13(0.04-0.39) 

gallstones and a positive sonographic 



Murphy sign 3 




Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative like¬ 
lihood ratio. 

“Requires special training and validation of competence. 


acute cholecystitis; they only evaluated agreement between 
bedside ultrasonography and formal ultrasonography with 
respect to specific radiologic findings. 

The single included study showed that physicians with 
brief training and moderate experience in bedside ultra¬ 
sonography could adequately visualize the gallbladder in 
most patients. 3 Even among patients with definitive bedside 
ultrasonography results, defined as the presence of both 
cholelithiasis and a sonographic Murphy sign, the positive 
predictive value was only 70%. Thus, patients with positive 
findings on bedside ultrasonography require confirmatory 
radiologic investigations before proceeding to surgery. The 
negative predictive value is 90%. For 30 patients in this 
study, bedside ultrasonography detected no sonographic 
Murphy sign and no cholelithiasis. Had these patients not 
undergone formal ultrasonography, there would have been 
a 26% reduction in ultrasonography use by the emergency 
department, at a cost of missing 1 case of cholecystitis. It is 
tempting to regard this miss rate of 97% as clearly adequate 
to rule out cholecystitis, but the 95% Cl for 1 of 30 extends 
from 0.6% to 17%. Among patients for whom the pretest 
suspicion of cholecystitis is low, a definitely negative bed¬ 
side ultrasonography result probably would be adequate to 
decide against formal ultrasonography, especially if ade¬ 
quate clinical follow-up is in place. 

Evidence From Guidelines 

There are no governmental agency guidelines that address 
the diagnosis of acute cholecystitis. 


CLINICAL SCENARIO—RESOLUTION 


You decide to observe the patient in the hospital for at 
least 24 hours. The patient’s pain improves with intra¬ 
venous morphine, and he is admitted to the medical 
service. Overnight, he requires increasing doses of 
morphine for pain control, but the electrocardiogram 
output and cardiac enzyme levels remain normal. The 
following morning, the abdominal tenderness has wors¬ 
ened, and the white blood cell count has increased to 
19200/p.L, but the serum amylase level remains nor¬ 
mal. Given the persistent concern for acute cholecysti¬ 
tis, he undergoes formal abdominal ultrasonography in 
the radiology department, which reveals substantial 
pericholecystic fluid and a positive sonographic Mur¬ 
phy sign. A laparoscopic cholecystectomy reveals an 
acutely inflamed gallbladder with a small stone impacted 
in the cystic duct. Pathology confirms the diagnosis of 
acute cholecystitis. 
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ACUTE CHOLECYSTITIS—MAKE THE DIAGNOSIS 


No single clinical finding, or known combination of clinical 
history and physical examination findings, efficiently estab¬ 
lishes a diagnosis of acute cholecystitis. Thus, clinicians 
must rely on their clinical gestalt. Bedside ultrasonography 
requires additional study, and clinicians must receive proper 
training, followed by demonstration of their proficiency. 

PRIOR PROBABILITY 

Approximately 5% of emergency department patients with 
abdominal pain have cholecystitis. Women and Native 
Americans have a higher risk of cholecystitis. Patients with 
increased risk of cholecystitis include those with chronic 
hemolytic disease (eg, sickle cell disease) or recent rapid 
weight loss. 

POPULATION FOR WHOM ACUTE 
CHOLECYSTITIS SHOULD BE CONSIDERED 

Patients with abdominal pain. 


DETECTING THE LIKELIHOOD OF ACUTE CHOLECYSTITIS 

See Table 12-4. 


Table 12-4 Likelihood Ratios for Acute Cholecysitis 


Finding (No. of Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical gestalt 3 

=25-30 


Murphy’s sign (n = 3) 

2.8 (0.8 to 8.6) 

0.5 (0.2 to 1.0) 

Right upper quadrant tenderness (n = 4) 

1.6 (1.0 to 2.5) 

0.4 (0.2 to 1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

The LR is imputed from the baseline pretest probability (5%), the sensitivity and specificity 
of ultrasonography (0.88 and 0.80, respectively), and the false-positive rate of diagnosis. 

REFERENCE STANDARD TESTS 

Surgical findings combined with pathology or clinical fol¬ 
low-up in patients who do not undergo surgery remain the 
reference standard for acute cholecystitis. 
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EVIDENCE TO SUPPORT THE UPDATE 


Cholecystitis 



TITLE A Questionnaire for the Assessment of Biliary 
Symptoms. 

AUTHORS Romero Y, Thistle JL, Longstreth GF, et al. 

CITATION Am J Gastroenterol. 2003;98(5):1042-1051. 

QUESTION What are the reproducibility, concurrent 
validity, and discriminative ability of a questionnaire 
designed to elicit patients’ self-reported biliary symptoms? 

DESIGN Prospective, independent, consecutive sample 
of blinded patients and investigators. 

SETTING Referral gastroenterology practice at a major 
teaching institution, Rochester, Minnesota. 

PATIENTS Two hundred forty-five adults (aged > 18 
years) referred to an outpatient clinic. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A Biliary Symptoms Questionnaire (BSQ) was developed 
according to a review of the literature and the experience of 
the investigators using previously developed questionnaires 
for irritable bowel syndrome (IBS) and gastroesophageal 
reflux disease (GERD) as templates. The 114-question instru¬ 
ment was administered to subjects on initial presentation 
and then again after a 2-week interval. After the initial sur¬ 
vey, subjects also underwent a structured interview con¬ 
ducted by investigators, who then completed their own BSQ 
according to the interview findings. Finally, investigators 
reviewed 10 BSQs of patients with known diagnoses (as 
determined by clinical follow-up and gastroenterologist 
opinion) and decided whether IBS, GERD, or biliary disease 
was the most likely diagnosis. A shortened BSQ was tested 
for reproducibility. 

MAIN OUTCOME MEASURES 

Agreement was expressed as simple agreement (%) and 
agreement beyond chance (k). The domains assessed were as 
follows: 


1. Agreement between the serial surveys administered to the 
patient (reproducibility) 

2. Agreement between patient-reported symptoms and phy¬ 
sician-reported symptoms (concurrent validity) 

3. Agreement between investigator diagnosis according to 
the BSQ and gastroenterologist clinical diagnosis (dis¬ 
criminative validity) 

MAIN RESULTS 

Patients exhibited reasonable consistency throughout the 2- 
week test-retest period (see 2-5). In addition, physi¬ 

cians concurred with patients’ self-reported symptoms with 
moderate or better agreement. Patient reproducibility and 
physician concurrence were almost perfect for complaints of 
upper abdominal pain (k = 0.94 for both) and for jaundice (k 
= 0.94 and K = 0.84 for reproducibility and concurrence, 
respectively). Moderate agreement was observed for radia¬ 
tion of the pain (k = 0.47 and 0.46 for reproducibility and 
concurrence, respectively). For fever, patients reported the 
symptom with substantial reproducibility (k = 0.79), but 
physicians concurred with only moderate agreement (k = 
0.52). Although the questionnaire performed reasonably well 
in terms of discriminative ability (k = 0.58), the limited sam- 


Table 12-5 Questionnaire Results for Reproducibility and 
Concurrent Validity 



Reproducibility, 
k (95% Cl) 

Concurrent Validity, 
k (95% Cl) 

Emesis 

0.95 (0.85 to 1) 

0.73 (0.60 to 0.87) 

Jaundice 

0.94 (0.83 to 1) 

0.84 (0.71 to 0.97) 

Pain in upper abdomen 

0.94 (0.83 to 1) 

0.94 (0.86 to 1) 

Nausea 

0.81 (0.65 to 0.98) 

0.75 (0.61 to 0.88) 

Fever 

0.79 (62 to 0.97) 

0.52 (0.36 to 0.68) 

Biliary symptoms 8 

0.72 (-0.03 to 0.95) 

0.64 (0.15 to 0.95) 

Radiation to right shoulder 

0.47 (-0.15 to 1) 

0.46 (0.21 to 0.72) 


Abbreviation: Cl, confidence interval. 

The results for biliary symptoms reflect the median agreement across all 18 questions 
identified as biliary (as opposed to gastroesophageal reflux disease or irritable bowel 
syndrome), including stabbing upper abdominal pain, cramping upper abdominal pain, 
radiation to the back, radiation to the shoulder blade, periodicity of pain episodes, day¬ 
time or nocturnal occurrence, and pain improved with movement, among others. 
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TITLE Ultrasonography by Emergency Physicians in 
Patients With Suspected Cholecystitis. 

AUTHORS Rosen CL, Brown DF, Chang Y, et al. 

CITATION Am JEmergMed. 2001;19(l):32-36. 

QUESTION How well do the assessments of emergency 
physicians using bedside ultrasonography (BUS) agree 
with the results of formal ultrasonography and clinical fol¬ 
low-up in the evaluation of suspected cholecystitis? 

DESIGN Prospective, independent, convenience sample. 

SETTING Emergency department at a major teaching 
hospital, Boston, Massachusetts. 

PATIENTS One hundred sixteen adults (aged > 18 
years) who presented with abdominal pain and were sus¬ 
pected of having cholecystitis. 


pie size (only 10 patients) and presentation of only 3 diag¬ 
nostic choices (biliary colic, GERD, and IBS) severely limit 
this finding. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 for reproducibility and con¬ 
current validity. Level 4 for discriminative validity (noninde¬ 
pendent sample with small numbers). 

STRENGTHS The reproducibility and concurrent validity 
sections were well designed. 

LIMITATIONS The study was designed primarily to assess 
the utility of a questionnaire as a research tool rather than to 
assess the variability in patient and physician reporting of 
abdominal symptoms. In testing the discriminative validity 
of the questionnaire, a small sample (10) of patients was 
used. In addition, investigators were given only 3 possible 
diagnoses to choose from—biliary pain, GERD, and IBS— 
which likely resulted in a significant overestimation of the 
discriminative ability of the questionnaire. The shortened 
BSQ was tested only for reproducibility, not concurrent 
validity or discriminative ability. 

This study evaluated the reproducibility and concurrent 
validity of a questionnaire aimed at evaluating those with 
possible biliary colic. Although a few conclusions may be 
inferred regarding the variability in reporting of abdominal 
symptoms, the main focus of the study was to validate the 
instrument as a research tool. Patients appeared to be reason¬ 
ably consistent in reporting most abdominal symptoms over 
time, and physicians generally concurred in their assessments 
of patients’ symptoms. 

Reviewed by Robert L. Trowbridge, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Fifteen full-time emergency physicians underwent a 5-hour 
course, including didactic learning and hands-on training, 
on the use of an ultrasonographic machine to identify the 
gallbladder, detect gallstones, and elicit a sonographic Mur¬ 
phy sign. 

The bedside ultrasonographic findings were compared 
not only with formal ultrasonography by radiologists but 
also clinical follow-up, including the results of other non- 
invasive tests for cholecystitis, operative reports, pathol¬ 
ogy, and use of telephone follow-up 1 month after 
emergency department visit to ascertain subsequent epi¬ 
sodes of abdominal pain requiring medical attention emer¬ 
gency visits. 


MAIN OUTCOME MEASURES 

Agreement between BUS and formal ultrasonography in the 
detection of gallstones or presence of sonographic Murphy 
sign (ie, sensitivity and specificity of bedside ultrasonogra¬ 
phy, using formal ultrasonography as reference standard). 


MAIN RESULTS 

Among 116 patients, the physician performing BUSs could 
not visualize the gallbladder adequately in 6 (5.2%) cases. 
Four of these 6 cases were diagnosed as cholecystitis on for¬ 
mal ultrasonography. The authors explicitly state their inter¬ 
est in focusing on cases in which bedside ultrasonography 
appears to provide a definitive answer. Definitive BUS results 
were defined as both findings present or both absent (ie, both 
gallstones and sonographic Murphy sign present or both 
absent). Of the 116 patients, 70 (60%) had definitive findings 
(see _ible 1 ). Although we do not show it here, the 
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Table 12-6 Likelihood Ratio of Bedside Ultrasonography 





Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Definitive bedside ultrasonography compared with clinical 
follow-up for detection of cholecystitis 

91% 

66% 

2.7 (1.7-4.1) 

0.13(0.04-0.39) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 


authors presented data on the sensitivity and specificity of 
formal ultrasonography among patients with definitive 
results on BUSs. The negative likelihood ratio was similar to 
that above, but the positive likelihood ratio was much higher, 
at 14. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3, since not all patients referred 
for formal ultrasonography were selected to undergo BUS. 

STRENGTHS Radiologists performed formal ultrasonogra¬ 
phy without knowing the results of bedside ultrasonography. 
Distinct comparisons with formal ultrasonography and clini¬ 
cal follow-up provide useful information because cases not 
detected by formal ultrasonography would not be expected 
to be detected by BUS. Appropriately designed analysis, 
including adjusting for clustering effects. 

LIMITATIONS Convenience sample. It was not clear how 
clinicians decided to choose which patients they referred 
for right upper quadrant ultrasonography. Unconsciously 
or not, physicians may have selected cases in which bedside 
ultrasonography was likely to perform well. The 3 physi¬ 
cians with the most training and previous experience were 
investigators in the study, and they contributed almost half 


of the patients. The remaining physicians each contributed 
10 or fewer patients; 2 physicians contributed only 1 patient 
each. 

This study evaluated the potential effect of performing 
BUS on requests for formal ultrasonography to evaluate sus¬ 
pected acute cholecystitis. The limitations of the study 
(above) are important, but other well-designed aspects of the 
design and presentation of the results allow us to draw some 
reasonable conclusions. 

Physicians with brief training and moderate experience in 
bedside ultrasonography can adequately visualize the gall¬ 
bladder in the majority of patients (95% in this study). 
Approximately 60% of patients had definitive BUS results, 
defined as the presence of both cholelithiasis and a sono¬ 
graphic Murphy sign. Among such patients, the positive pre¬ 
dictive value of only 70% means positive results require 
confirmation with formal ultrasonography. The negative 
predictive value is 90%. The authors point out that, for 30 
patients, bedside ultrasonography detected no sonographic 
Murphy sign and no cholelithiasis. Had these patients not 
been sent for formal ultrasonography, there would have been 
a 26% reduction in ultrasonographic use by the emergency 
department, at a cost of missing 1 case of cholecystitis. 

Reviewed by Kaveh G. Shojania, MD 
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CHAPTER 


CLINICAL SCENARIOS—DO THESE 
PATIENTS HAVE AIRFLOW LIMITATION? 


Does the Clinical 
Examination Predict 

Airflow Limitation? 

Donald R. Holleman Jr, MD 
David L. Simel, MD, MHS 


In each of the following cases, the clinician needs to 
decide whether the patient has airflow limitation. In case 
1, a 63-year-old man who has smoked 2 packs of cigarettes 
per day for the past 47 years presents with decreased exer¬ 
cise tolerance caused by shortness of breath. In case 2, a 
35-year-old woman complains of coughing, wheezing, 
and shortness of breath every autumn. In case 3, an 18- 
year-old man is brought to an emergency department, 
with extreme difficulty breathing that began earlier that 
evening. 


WHY IS IT IMPORTANT TO DETECT AIRFLOW 
LIMITATION BY CLINICAL EXAMINATION? 


Airflow limitation is a disorder known by many names, 
including airway obstruction and obstructive airways disease. 
Recognizing airflow limitation can lead to appropriate treat¬ 
ment and can yield important prognostic information. 
Patients with symptomatic airflow limitation may benefit by 
treatment with oral or inhaled bronchodilators, oral or inhaled 
glucocorticoids, or antibiotics. Recognition of this disorder 
also triggers environmental controls and preventive services, 
such as vaccination against pneumococcus and influenza. 

Screening is advocated for target disorders in which early 
intervention favorably affects patient outcomes. Physicians do 
not screen for airflow limitation because early intervention has 
not been shown to alter the disease course. Therefore, clini¬ 
cians are likely to want to confirm or rule out disease in 
patients presenting with pulmonary symptoms, such as cough 
or dyspnea, rather than screen for unrecognized disease in 
asymptomatic individuals. 

The 3 clinical scenarios illustrate cases in which recognizing 
airflow limitation by the clinical examination is important. In 
the first case, recognizing airflow limitation might lead to the 
diagnosis of pulmonary emphysema, more intensive counseling 
on smoking cessation, vaccination against influenza and pneu¬ 
mococcal infection, and bronchodilator therapy to improve 
exercise tolerance. In the second case, recognizing airflow limita¬ 
tion might lead to the identification of environmental irritants 
or allergens responsible for symptoms. In the third case, recog¬ 
nizing airflow limitation would lead to the diagnosis of asthma 
and to acute, potentially lifesaving therapy with bronchodilators 
and systemic glucocorticoids. Recognizing airflow limitation 
clinically may have time, cost, and convenience advantages com¬ 
pared to routine pulmonary function testing. 

Spirometry is the test of choice for confirming a diagnosis of 
airflow limitation. Both the forced expiratory volume in 1 sec¬ 
ond (FEVj) and forced vital capacity (FVC) values are reduced 
in patients with airflow limitation; because the FEVj is affected 
more than the FVC, the ratio of FEVj to FVC (FEVj/FVC) also 
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decreases. The reduced FEVj/FVC is the hallmark of airflow 
limitation. Although emphysema and chronic bronchitis rep¬ 
resent permanent reductions in airflow, asthma is a disorder 
characterized by increased responsiveness of the bronchial tree 
to a variety of stimuli, leading to intermittent airflow limita¬ 
tion. 1 In patients with asthma, provocative testing, such as 
methacholine challenge, may be necessary to bring about air¬ 
flow limitation between symptomatic episodes. 

The reference standard for airflow limitation is the measure¬ 
ment of the FEVj and the FVC by spirometry. An FEVj/FVC 
lower than the fifth percentile for age, height, and sex is con¬ 
sidered abnormal. 2 However, a normal FEV|/FVC during an 
asymptomatic period does not rule out intermittent airflow 
limitation. For most patients, the fifth percentile of FEVj/FVC 
is approximately 70%, but using this single value to diagnose 
airflow limitation is discouraged. 2 

We performed an English-language MEDLINE search, using 
the following Medical Subject Headings: (EXP Medical History 
Taking OR EXP Physical Examination) AND (EXP Lung Dis¬ 
eases, Obstructive). The tides and abstracts of the 1022 articles 
retrieved from the above MEDLINE search were reviewed inde- 
pendentiy by the 2 authors. If either reviewer chose an article as 
possibly useful, the article was reviewed for content. The authors 
had excellent agreement (k = 0.85) on the 158 articles chosen 
for full review. If the article contained results of the clinical 
examination predicting airflow limitation, the article was 
reviewed for quality. References from appropriate articles were 
reviewed for additional references. Nineteen articles evaluating 
the clinical examination for airflow limitation 3 ' 21 used the 
accepted definition or a similar spirometric definition of disease. 
Others used a variety of definitions, including FEVj only 22 ' 27 or 
other, less-accepted or unclear definitions. 28 ' 37 We chose to 
include articles using reference standards that are not currently 
accepted because they were otherwise methodologically sound 
or they provided the only data available for some of the clinical 


Table 13-1 Reference Standards Used in Studies Yielding Operating 
Characteristics for Individual Clinical Examination Items 

Reference Standard 

References 

FEV-j < fifth percentile and FEV^FVC < fifth percentile 3 

14 

FEV^FVC < fifth percentile 

11 

FEV-)/FVC < 0.70 

5-8,16,18, 22 

FEV-)/FVC < 0.75 and FVC < 80% of predicted 

9 

FEV-i < 75% of predicted and FEV^FVC < 0.80 

20 

FEV-] < 70% of predicted 

23,24 

FEV-, < 2 L 

25, 26 

FEV-| < fifth percentile 

37 

FEV-, < 60% of predicted or FEV-,/FVC < 0.60 

17 

Roentgenography, total lung capacity, and residual 
capacity 

33 

FEV^FVC < 0.6 or history 

31 

Diagnosis of asthma 

32 

Normal spirometry 

30 


Abbreviations: FEV-|, forced expiratory volume in 1 second; FVC, forced vital capacity. 
“The definition recommended by the American Thoracic Society. 1 


examination findings. The reference standards used in studies 
evaluating operating characteristics for individual clinical exam¬ 
ination items are fisted in Table 13-1. Because all studies used 
reference standards of current airflow limitation, the results in 
this review can be used only to predict airflow limitation at the 
evaluation. Patients with asthma may be overlooked if examined 
between attacks. 

PATHOPHYSIOLOGIC CHARACTERISTICS 
OF AIRFLOW LIMITATION 

Understanding the physiologic characteristics of pulmonary air¬ 
flow helps to explain the clinical examination findings in airflow 
limitation. The airways are a branching system of tubes that fink 
the outside atmosphere with the lung parenchyma. During 
inspiration, the thoracic cavity actively expands. As the chest 
volume increases, the intrathoracic pressure decreases. Because 
the airways are open to the atmosphere, air flows into the air¬ 
ways to equalize the intrathoracic pressure with the atmospheric 
pressure. Therefore, during inspiration, the pressure inside the 
airways is greater than the pressure in the surrounding lung. 
This pressure exerts a force on the inner wall of the airway, 
increasing the airway diameter during inspiration. 

At end inspiration, the chest no longer expands, and the intra- 
thoracic-to-atmospheric pressure difference disappears. During 
expiration, the thoracic cavity passively contracts. As the chest 
volume decreases, the intrathoracic pressure increases and 
exceeds the atmospheric pressure. Because the airways commu¬ 
nicate with the atmosphere, the pressure inside the airways is 
lower than the pressure in the surrounding lung. This pressure 
difference exerts a force on the outer wall of the airway, decreas¬ 
ing the airway diameter during expiration. The resistance to air¬ 
flow is inversely and exponentially related to the diameter of the 
airway, so small decreases in airway diameter lead to large 
increases in resistance. 

During inspiration and expiration, the diameter of the airway 
varies around its static, resting diameter. In airflow limitation, 
the resting airway diameter is abnormally small. In emphysema, 
the lung parenchyma is destroyed. This leads to a decrease in the 
tethering forces that maintain airway diameter, resulting in 
decreased resting airway diameter. In asthma, the smooth mus¬ 
cle that surrounds the airway is hyperreactive to various stimuli. 
When one of these stimuli is present, the smooth muscle con¬ 
tracts. This leads to decreased resting diameter of the airway. In 
chronic bronchitis, there is increased mucus production in the 
airways. There may also be decreased mucus clearance caused by 
ciliary dysfunction. The resulting increased intra-airway mucus 
coats the inner wall of the airway. This leads to decreased resting 
diameter of the airway. Thus, in airflow limitation syndromes, 
the resistance to airflow is increased throughout the respiratory 
phase. Because of the further physiologic decrease in airway 
diameter during expiration, it is significantly more difficult to 
empty the lungs than to fill them. This leads to air trapping and 
to lung hyperinflation that can be demonstrated by an abnor¬ 
mally large residual volume on pulmonary function testing. 

The touted physical examination findings for airflow limi¬ 
tation arise either from the difficulty in emptying the lungs or 
from the resulting hyperinflation. The prolonged expiratory 
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phase, wheezing, rhonchi, and match test are signs of abnor¬ 
mally high resistance to airflow during expiration. Decreased 
breath sounds, barrel chest, hyperresonance, decreased car¬ 
diac and hepatic dullness, absent or subxiphoid cardiac apical 
impulse, decreased chest expansion, and decreased diaphrag¬ 
matic movement are signs of hyperinflation. Use of accessory 
muscles results from both the increased work of expiration 
and pulmonary hyperinflation. 


HOW TO ELICIT SYMPTOMS AND 
SIGNS OF AIRFLOW LIMITATION 

A concise evaluation for airflow limitation includes a focused 
medical history and physical examination. 

History 

The history should elicit background features and specific 
symptoms. 

Background Information 

The most important background features are exposure to 
cigarette smoke and to occupational or environmental pol¬ 
lutants. The duration of cigarette exposure can most easily be 
elicited by asking at what age the patient started smoking and 
in what year he or she quit. Although pack-years is the tradi¬ 
tional measure of cigarette exposure, quantifying years of 
exposure works at least as well. 13 The patient’s personal and 
family history of atopic diseases is also associated with 
increased likelihood of asthma. 

Symptoms 

The most important symptoms to elicit from patients with 
suspected airflow limitation are wheezing, coughing, and 
sputum production. In fact, chronic bronchitis is defined by 
sputum production for at least 3 consecutive months in at 
least 2 consecutive years. 1 

Physical Examination 

The physical examination for airflow limitation should 
include inspection, measuring vital signs, palpation, percus¬ 
sion, auscultation, and expiratory airflow. 

Inspection 

While assessing the patient’s overall appearance, the clinician 
should observe for the presence of a barrel chest. If the 
anteroposterior diameter appears greater than normal, the 
patient has a barrel chest deformity. This finding may be 
more an illusion than a true deformity because the antero¬ 
posterior dimensions have not been shown to be increased in 
patients with clinically defined barrel chests. 38 

Vital Signs 

While measuring blood pressure, the clinician can determine 
whether there is pulsus paradoxus. This maneuver may be most 
helpful in patients with suspected acute airflow limitation. Dur¬ 
ing tidal breathing, the sphygmomanometer is inflated to above 
the systolic blood pressure. The cuff pressure is slowly released 
until the first Korotkoff sound is heard only during expiration; 


this systolic blood pressure value is noted. The cuff pressure is 
further reduced until the first Korotkoff sound is heard through¬ 
out inspiration; the systolic blood pressure at this point is also 
noted. The systolic blood pressure is normally lower during 
inspiration than during expiration. The normal difference is 
accentuated when the patient has airflow limitation. If the dif¬ 
ference between these 2 pressures is at least 15 mm Hg, the 
patient has pulsus paradoxus. 

Palpation 

Palpation should include locating the cardiac apical impulse. 
Chest palpation should be performed with the patient supine 
and disrobed from the waist up. A sheet or gown should be used 
to maintain patient comfort and privacy; however, palpation 
should be performed with the hand directly on the chest wall. 
When the chest volume is increased because of hyperinflation, 
the cardiac apex shifts to a more central location and either may 
not be palpable or may be palpable in the subxiphoid area. 

Percussion 

The chest should be percussed to determine the quality of the 
sound that resonates. Percussion of the chest wall should be 
performed by placing a digit (usually the second or third) of 
the nondominant hand firmly against the chest wall parallel 
to and between the ribs. The second and third digits of the 
dominant hand are flexed slightly at the metacarpophalan¬ 
geal and proximal and distal interphalangeal joints to form a 
slight arch with the 2 fingertips even. The fingertips of the 
dominant hand tap the distal interphalangeal joint of the 
nondominant hand with a firm pecking motion. If the sound 
is more hollow than normal, the chest is hyperresonant. 

Auscultation 

Clinicians should auscultate the chest for wheezes, rhonchi, and 
breath sound intensity. Chest auscultation should be performed 
in a quiet room with the patient disrobed from the waist up. The 
warmed stethoscope diaphragm should be placed with moderate 
pressure on the patient’s chest to ensure good sound transmis¬ 
sion. The chest should be auscultated bilaterally over the lower, 
middle, and upper lung fields posteriorly, anteriorly, and along 
the midaxillary line. Patients should be breathing heavily, 
but not forcefully. Wheezing will be heard as high-pitched musi¬ 
cal tones especially during expiration. Rhonchi are lower-pitched 
wheezes. 39 The intensity of breath sounds should be observed. 
Although elaborate scoring systems for breath sound intensity 9,26 
and for wheezing 16 have been developed, they are not clearly bet¬ 
ter than the customary normal vs abnormal dichotomization. 

Measures of Airflow 

Measures of expiratory airflow include the forced expiratory 
time 13,17 and the match test. 16,24,25 To perform a forced expira¬ 
tory time test, the patient must take a deep breath and force¬ 
fully exhale until no more air can be expelled. During this 
maneuver, the patient must keep mouth and glottis fully 
open as if the patient were yawning. While the patient is per¬ 
forming the forced expiration, the clinician listens over the 
larynx or lower trachea with a stethoscope and times the 
duration of audible airflow. To obtain the best results, the 
forced expiratory time should be measured with a stopwatch 
and recorded to the nearest 0.1 second. An alternative 
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maneuver is the match test. During this test, the patient per¬ 
forms a forced expiration exactly as in the forced expiratory 
time maneuver. However, the clinician holds a burning 
match 10 cm from the patient’s widely open mouth. If the 
match is still burning after the forced expiration, the test 
result is positive. Others have used a candle for this test. 
However, one needs a match to light a candle, and we can 
find no benefit in carrying around both except for those who 
frequently practice in the dark. Also, to avoid malpractice 
claims and personal injury, we do not recommend this test in 
patients receiving supplemental oxygen! 

PRECISION OF HISTORY AND SYMPTOMS 
FOR AIRFLOW LIMITATION 

The observer agreement for smoking history, dyspnea, cough¬ 
ing, wheezing, chronic bronchitis, and orthopnea has been 
described with the K statistic. 13 Two physicians almost always 
agree on the smoking history (k = 0.95). Physicians agree fre¬ 
quently on the presence or absence of wheezing (k = 0.61), 
chronic bronchitis (k = 0.55), dyspnea (k = 0.44-0.48), and 
coughing (k = 0.46). 

ACCURACY OF MEDICAL HISTORY AND 
SYMPTOMS FOR AIRFLOW LIMITATION 

Table 13-2 summarizes the operating characteristic estimates 
for airflow limitation, obtained for each historical item and 
symptom, after pooling data from referenced studies. 

Background Information 

The best background information for diagnosing airflow 
limitation is exposure to cigarette smoke. Although patients 


who have smoked are only slightly more likely to have airflow 
limitation, 5 ' 6 - 13 never having smoked cigarettes is moderately 
well associated with decreased likelihood of disease. 5,6,13 Per¬ 
haps more useful is the fact that the number of years the 
patient has smoked correlates well with the likelihood of dis¬ 
ease (Figure 13-1). 13 Patients with at least a 70-pack-year his¬ 
tory of smoking are much more likely to have airflow 
limitation. 16 

Age is related to airflow limitation. Asthma is more com¬ 
mon in the young, whereas chronic bronchitis and emphy¬ 
sema are more common in older patients. The prevalence of 
airflow limitation appears to be lowest between ages 10 and 
30 years. 40 The higher prevalence at younger ages is due to 
asthma, which frequently remits after childhood. The higher 
prevalence in the older age group is probably due to 2 factors. 
First, age is a proxy for exposure to toxins, especially cigarette 
smoke. When smokers and nonsmokers are analyzed sepa¬ 
rately, the prevalence of airflow limitation does not appear to 
increase significantly with age in nonsmokers. 41 Second, in 
adults, most airflow limitation is a chronic disease, so new 
incident cases are added faster than attrition from mortality, 
except in the very old. Therefore, advancing age is associated 
with increased likelihood of airflow limitation in adult smok¬ 
ers, but airflow limitation should not be considered a normal 
process of aging. 

Symptoms 

Symptoms of chronic bronchitis, 13,19 sputum production of at 
least one-fourth of a cup when present, 16 or wheezing 13,36 are 
associated with a moderate increase in the likelihood of air¬ 
flow limitation. However, symptoms of cough 5,13 or exertional 
dyspnea 13,36 are associated with only a slight increase in the likeli¬ 
hood of airflow limitation. Orthopnea is not useful in diagnos- 


Table 13-2 Composite Operating Characteristics of History Items Predicting Airflow Limitation 

Grade of 


Item 

Recommendation 3 

References 

Sensitivity, % 

Specificity, % 

LR+ 

LR- 


Smoking history 

>70 vs <70 pack-years 

B 

17 

40 

95 

8.0 

0.63 


Ever vs never 

A 

6, 7,14 

92 

49 

1.8 

0.16 


Sputum production > 1 U cup 

B 

17 

20 

95 

4 

0.84 


Symptoms of chronic bronchitis 

A 

14, 20 

30 

90 

3.0 

0.78 


Wheezing 

B 

14 

51 

84 

3.8 

0.66 


Exertional dyspnea 

Grade 4 vs 3 or less 

A 

20 

03 

99 

3.0 

0.98 


Any vs none 

A 

20 

27 

88 

2.2 

0.83 


Coughing 

B 

14 

51 

71 

1.8 

0.69 


Any dyspnea 

B 

14 

82 

33 

1.2 

0.55 



Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“The recommendation grading scheme was provided by David L. Sackett, MD, and Charles H. Goldsmith, PhD. Grade A: independent, blind comparison of sign or symptom with a gold 
standard of diagnosis among a large number of consecutive patients suspected of having the target condition. Grade B: independent, blind comparison of sign or symptom with a gold 
standard of diagnosis among a small number of consecutive patients suspected of having the target condition. Grade C: independent, blind comparison of sign or symptom, with a 
gold standard of diagnosis among nonconsecutive patients suspected of having the target condition; or nonindependent comparison of sign or symptom with a gold standard of diag¬ 
nosis among samples of patients who obviously have the target condition plus, perhaps, individuals with normal results; or nonindependent comparison of sign or symptom with a 
standard of uncertain validity (see Table 1 -7). 
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ing airflow limitation, because its positive likelihood ratio 
(LR+) and negative likelihood ratio (LR-) are not significantly 
different from l. 13 No single symptom effectively rules out air¬ 
flow limitation. The absence of dyspnea 5 - 13 - 36 or of exertional 
dyspnea 13 - 36 is only moderately useful in ruling out disease. 


PRECISION OF THE SIGNS OF AIRFLOW LIMITATION 

K Statistics or correlation coefficients have generally been 
used to describe the precision of physical examination items 
for airflow limitation. 13 - 16 - 17 - 28 - 34 - 35 - 37 

Inspection 

Precision has not been studied for most inspection items, 
and physicians agree only part of the time that a patient has a 
cough (k = 0.29), 13 which can probably be explained largely 
by patients having paroxysms of coughing. They may cough 
during one, but not the other, examination. 

Vital Signs 

The precision of pulsus paradoxus has not been well studied. 

Palpation 

Physicians agree only part of the time on the results of pal¬ 
pating for an absent apical impulse (k = 0.39). 34 Physician 
agreement on whether a patient has a subxiphoid apical 
impulse may be no greater than chance (k = 0-0.3). 13,16 How¬ 
ever, the low prevalence of this finding may lead to underesti¬ 
mating the chance-corrected agreement. 

Percussion 

Physicians appear to agree infrequently on the results of chest 
percussion. However, only hyperresonance (k = 0-0.42) 16 - 37 
and diaphragmatic excursion (k = -0.04; r = 0.24) have been 
studied. 16 - 35 

Auscultation 

Physicians agree frequently on the results of auscultation for 
wheezing (k = 0.43-0.93), 13 - 16 - 37 whereas they agree less fre¬ 
quently on breath sound intensity (k = 0.23-0.47) 13 - 16 - 28 - 37 and 
crackles (k = 0.30-0.63). 37 

Measures of Airflow 

Physicians frequently obtain the same results when measuring 
forced expiratory time (intraclass correlation, 0.81; K = 0.7) 13 - 17 
or interpreting the match test (k = 0.39). 16 Agreement on the 
forced expiratory time is better if a stopwatch is used instead 
of a second hand. 


ACCURACY OF THE SIGNS OF AIRFLOW LIMITATION 

Table 13-3 summarizes the operating characteristic estimates 
for airflow limitation, obtained for each sign, after pooling 
data from referenced studies. 


Smoking Probability 

History, y of Airflow 

Obstruction, % 

A B 

0 -| 


25 - 

-0 

50 - 

- 25 

75 “ 

-50 


L 75 


1 - 
3 - 
5 - 
10 - 

25 - 
50 - 
75 - 

90 - 
95 - 
97 - 
99 A 


Wheezing or PEF, 
on Examination L/min 


- 720 

- 660 
- 600 

- 540 

- 480 


No- 


- 420 
-360 
300 


Yes 


240 
180 
- 120 


- 60 


Figure 13-1 Predicting Probability of Airflow Obstruction at the Bedside 

Choose the number of years the patient smoked cigarettes under the 
“Smoking History” heading; use scale A if the patient reports no symp¬ 
toms of wheezing or scale B if the patient reports symptoms of wheezing. 
Under “Wheezing on Examination,” select “No” if wheezing was absent or 
“Yes” if wheezing was present (alternatively, the best of 3 peak expira¬ 
tory flow [PEF] rates could be chosen under the “PEF” heading). With a 
straightedge, connect the points chosen on the “Smoking History” and 
“Wheezing on Examination” lines. Read the probability of airflow limita¬ 
tion where the straightedge intersects the line under the “Probability of 
Airflow Obstruction” heading. 

Reprinted from Holleman et al, 13 with the permission of the Journal of Gen¬ 
eral Internal Medicine. 


Inspection 

A barrel chest 31 - 32 predicts airflow limitation. However, the 
evidence for this association comes largely from one study 
in asthmatic children. Recent studies using currently ac¬ 
cepted reference standards have failed to include this find¬ 
ing. Therefore, the value of the barrel chest sign in adults is 
not well supported. Other inspection items (accessory mus¬ 
cle use, excavated supraclavicular fossae, and coughing) 
have not been studied in a large enough sample of patients 
to determine the extent of their usefulness in diagnosing 
airflow limitation. 13 - 32 - 36 In other words, their likelihood ra¬ 
tio confidence intervals are wide and include l. 42 Decreased 
chest expansion and kyphosis have been studied only in pa¬ 
tients with known disease, 32 so their usefulness has not yet 
been determined. Patients who do not use accessory 
muscles 32 - 36 or who do not have excavated supraclavicular 
fossae are only slightly less likely to have airflow limita¬ 
tion. 36 Patients without a barrel chest 31 - 32 or who do not 
cough 13 are significantly less likely to have airflow limitation 
but the clinical importance of the absence of these findings 
is negligible. Therefore, the only inspection item we can 
recommend is looking for a barrel chest. The presence of 
this finding, especially in children, virtually rules in airflow 
limitation. 
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Table 13-3 Composite Operating Characteristics of Physical Examination Items Predicting Airflow Limitation 


Item 3 

Grade of 

Recommendation 11 

References 

Sensitivity, % 

Specificity, % 

LR+ 

LR- 

Wheezing 

A 

14,17, 34 

15 

99.6 

36 

0.85 

Barrel chest 

B c 

32 

10 

99 

10 

0.90 

Decreased cardiac dullness 

B 

17 

13 

99 

10 

0.88 

Match test 

B 

17, 25, 26 

61 

91 

7.1 

0.43 

Rhonchi 

B 

31,32 

8 

99 

5.9 

0.95 

Hyperresonance 

B 

17 

32 

94 

4.8 

0.73 

Forced expiratory time, s“ 

A 

14,18 





>9 





4.8 


6-9 





2.7 


<6 





0.45 


Subxiphoid cardiac apical impulse 

B 

14,17 

8 

98 

4.6 

0.94 

Pulsus paradoxus (>15 mm Hg) 

C 

8, 23, 24 

45 

88 

3.7 

0.62 

Decreased breath sounds 

B 

14,17 

37 

90 

3.7 

0.70 

Accessory muscle use 

C 

33, 37 

24 

100 

e 

0.70 

Excavated supraclavicular fossae 

C 

37 

31 

100 

e 

0.69 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“Listed in order of decreasing LR+. 

b See Table 1 -1 for a summary of Evidence Grades and Levels. 

“This recommendation includes only children. Sensitivity in adults was 4% (grade C). 1Z 

“Because the forced expiratory time test has 3 levels, LR-, sensitivity, and specificity cannot be calculated. 

“This item was studied in too few subjects to yield meaningful results. 


Vital Signs 

The presence of pulsus paradoxus of at least 15 mm Hg is 
associated with only a moderate increase in the likelihood of 
airflow limitation, and the absence of this sign is associated 
with only a slight reduction in the likelihood of disease. 7,22,23 
Other vital signs have not been studied and cannot be rec¬ 
ommended for use in determining the likelihood of airflow 
limitation. 

Palpation 

Palpating a subxiphoid cardiac apical impulse is associated 
with a moderate increase in the likelihood of airflow limita¬ 
tion. However, the absence of this finding is not useful. 13,16 
Absent apical impulse has been studied only in patients with 
known disease, 32 so its usefulness has not yet been deter¬ 
mined. Therefore, according to current evidence, we recom¬ 
mend palpating the subxiphoid region for the cardiac apical 
impulse. We recommend this despite the reportedly low 
observer agreement because the low prevalence of this find¬ 
ing may lead to underestimates of the chance-corrected 
agreement. 

Percussion 

Chest hyperresonance on percussion is associated with a 
moderate increase in the likelihood of disease. 16,32 Neither 
decreased cardiac dullness nor decreased diaphragmatic 
movement has been studied in enough patients to determine 


definitively the extent of usefulness. 16 However, patients with 
decreased cardiac dullness are more likely to have airflow 
limitation. Decreased liver dullness has been studied only in 
patients with known disease, 32 so its usefulness has not yet 
been determined. Patients without chest hyperresonance are 
only slightly less likely to have airflow limitation. 16,32 Normal 
cardiac dullness and normal diaphragmatic movement are 
likely not useful for decreasing the likelihood of airflow limi¬ 
tation. 16 We recommend percussing the chest for the reso¬ 
nance sound. Hyperresonance over the precordium may be 
particularly useful for increasing the likelihood of airflow 
limitation. 

Auscultation 

Objective wheezing, or wheezing observed on physical 
examination, is the most potent predictor of airflow limita¬ 
tion. Patients with wheezing almost certainly have airflow 
limitation. 13,15,16,37 However, this is true only of wheezing on 
unforced expiration. Forced expiration is associated with 
increased sensitivity of wheezing, and with decreased speci¬ 
ficity. The current literature suggests that the presence or 
absence of wheezing on forced expiration is of no value in 
diagnosing or ruling out airflow limitation. 15,20 Additionally, 
the sensitivity of wheezing increases with the severity of air¬ 
flow limitation. 13 Studies that recruited patients referred for 
spirometry 15,36 yielded sensitivities greater than those found 
in unreferred populations. 13,16 Although the sensitivity of 
wheezing varies greatly (10%-50%) by study population, 
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the LR+ and LR- change little. Rhonchi were associated 
with a moderate increase in the likelihood of airflow limita¬ 
tion in 2 studies 30,31 ; however, because neither study explic¬ 
itly defined rhonchi and because there is significant variability 
in how physicians define rhonchi, 43 this result must be 
interpreted cautiously. Decreased breath sounds are associ¬ 
ated with only a moderate increase in the likelihood of 
disease. 13,16,32 Absent wheezing, 13,15,16,36 normal breath sound 
intensity, 13,16,32 or absent rhonchi 30,31 are associated with only 
a moderate decrease in the likelihood of disease. We recom¬ 
mend auscultating the chest for wheezes and for breath 
sound intensity. Patients with wheezing should be consid¬ 
ered to have airflow limitation, and patients with decreased 
breath sound intensity should be considered somewhat 
more likely to have airflow limitation. Patients without 
wheezing or with normal breath sound intensity should be 
considered somewhat less likely to have this disorder. Nei¬ 
ther the presence nor absence of crackles (rales) helps with 
the diagnosis of airflow limitation. 8,13,29 

Measures of Airflow 

Patients who are unable to extinguish a lighted match held 10 
cm from the open mouth are significantly more likely to have 
airflow limitation than patients who are able to extinguish a 
match. The ability to extinguish a match is associated with a 
moderate decrease in the likelihood of disease. 16,24,25 The 
forced expiratory time 4,5,10,11,13,1618 is a continuous variable that 
can range from a few tenths of a second to more than 20 sec¬ 
onds. Unfortunately, each of the 4 best studies of forced expi¬ 
ratory time 10,13,16,17 used different methods. Two studies 10,16 
used average expiratory time, which makes bedside use cum¬ 
bersome. Of the other 2 studies, one used the shortest expira¬ 
tory time of 3 trials; 13 the other, the longest expiratory time of 
2 trials. 17 Because the ability to discriminate between patients 
with and without airflow limitation is the same regardless of 
whether the shortest or longest time is used, 13 there is no 
clear advantage to one method over the other. To allow pool¬ 
ing of results, one of the studies 13 was reanalyzed with the 
longest rather than the shortest time. When the longest expi¬ 
ratory time is chosen, a result less than 6 seconds was associ¬ 
ated with a modest decrease in the likelihood of airflow 
limitation; a result between 6 and 9 seconds was associated 
with a modest increase in the likelihood of airflow limitation; 
and a result greater than 9 seconds was associated with a 
great increase in the likelihood of airflow limitation. A forced 
expiratory time of approximately 9 seconds predicts an 
FEV j/FVC of 70%, 8 a level suggesting the diagnosis of airflow 
limitation. 

Peak expiratory flow rates predict airflow limitation (Figure 
13-1). 13 However, 2 studies have shown that peak expiratory 
flow adds little to the clinical examination for airflow limita¬ 
tion. 13,16 In one study, 16 peak expiratory results improved the 
accuracy of the clinical examination for only 1 of the 4 physi¬ 
cians studied. In the other study, 13 peak expiratory flow was 
equivalent to auscultating for wheeze, but more difficult to 
assess. Therefore, we cannot recommend routine peak flow 
measurements in the diagnosis of airflow limitation. Peak flow 


measurements may be useful in assessing benefit from therapy, 
especially for asthma. 

CAN THE CLINICAL EXAMINATION PREDICT 
SEVERITY OF AIRFLOW LIMITATION? 

Stubbing et al 3 found that the number of positive findings 
(tracheal descent during inspiration, sternomastoid contrac¬ 
tion, scalene contraction, supraclavicular fossae excavation, 
supraclavicular fossae recession, intercostal recession, or cos¬ 
tal margin movement) predicted the severity of airflow limi¬ 
tation in patients with known disease. These findings tended 
to be present only if the FEVj was less than 50% of the 
predicted value. The American Thoracic Society 1,2 found 
that the number of positive findings (barrel chest, low dia¬ 
phragm, decreased diaphragmatic excursion, decreased breath 
sounds, prolonged expiratory phase, wheezing, noisy inspi¬ 
ration, or crackles) predicted the severity of airflow limita¬ 
tion (r = 0.6). The literature suggests that, as airflow becomes 
more limited, more physical examination findings become 
apparent. 

ACCURACY OF THE OVERALL CLINICAL IMPRESSION 
FOR PREDICTING AIRFLOW LIMITATION 

Three studies 14,17,33 evaluated the accuracy of the overall clinical 
impression or a clinician’s ability to integrate all aspects of the 
clinical examination in forming an impression about the likeli¬ 
hood of airflow limitation. Clinicians’ overall impressions 13 
(graded as moderate to severe limitation [LR+ = 4.2], mild 
[LR+ = 0.82], or none [LR+ = 0.42]), predicted any airflow 
limitation only moderately well. However, Badgett et al 16 
found that clinicians’ impressions (blinded to medical history 
but not physical examination) predicted moderate to severe 
airflow limitation somewhat better (LR+ = 7.3; LR- = 0.53) 
and about as well as some of the individual findings in Table 
13-3. On the other hand, Fletcher 32 evaluated the clinical 
impressions of 6 physicians and found sensitivities ranging 
from 15% to 95% for airflow limitation. Therefore, clinicians’ 
ability to diagnose airflow limitation clinically is variable, but 
accuracy seems to improve as the severity of airflow limitation 
increases. 


COMBINATIONS OF INDIVIDUAL FINDINGS 

Six studies (Table 13-4) assessed the usefulness of combining 
clinical examination items to predict airflow limitation. 
Unfortunately, as with individual findings, combinations of 
findings do not effectively rule out airflow limitation. The 
best combination is never having smoked, no reported 
wheezing, and no wheezing on examination (Figure 13-1; 
LR-, 0.18). 13 Other combinations have LR- values ranging 
from 0.33 to 0.77. Even the best combination is no better 
than smoking history alone (LR-, 0.16). Therefore, combina¬ 
tions of Endings are more helpful for ruling in than for rul¬ 
ing out this disorder. In fact, a patient with any combination 
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Table 13-4 Combinations of Clinical Examination Items Predicting Airflow Limitation 


Clinical Examination Item 

Interpretation 

Relation to Airflow Limitation 

Reference Standard 

Years of cigarette exposure, patient-reported 
wheezing, objective wheezing 13 

See Figure 13-1 

LR+varies (see Figure 13-1) LR— = 0.18 

FE^/FVC and FEV-| < fifth percentile 

Patient-reported chronic obstructive pulmonary 

>2 Findings present 

LR+ = 34 

FEV-| < 60% of predicted or 

disease, > 70 pack-years of cigarette smoking, 
decreased breath sounds 16 

<2 Findings present 

LR- = 0.34 

FEV-|/FV(J < U.bO 

Dyspnea, subjective wheezing, objective wheezing, 
accessory muscle use, excavation of supraclavicular 
fossae, and distention of external jugular veins 36 

No. of findings present 

r=-0.64 

Ratio of FEV 1 to predicted FEV 1 

Breath sound intensity, use of scalene muscle, 
objective wheezing, and rales during cough 27 

No. of findings present 

Negatively correlated with FEV-| 

FEV-i 

Decreased breath sounds, objective wheezing, 

All 4 findings present 

LR+ = 3.3 

Abnormal FVC, FEV^ or 

rales, and prolonged expiratory time 33 

<4 Findings present 

LR- = 0.44 

maximal midexpiratory flow 

History by questionnaire, standardized physical 

Any abnormal finding 

LR+ = 1.4 

FEV-,/FVC < 0.70 

examination 21 

No abnormal findings 

LR- = 0.77 



Abbreviations: FEV-,, forced expiratory volume in 1 second; FVC, forced vital capacity; LR+, positive likelihood ratio; LR— , negative likelihood ratio. 


of 2 findings (>70-pack-year history of smoking, history of 
chronic obstructive pulmonary disorder, or decreased breath 
sounds) can be considered to have airflow limitation. 16 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 In case 1, the patient reported a 47-year smoking 
history but no other environmental or occupational expo¬ 
sures. He complained of episodes of wheezing but had no 
wheezing on examination. According to Figure 13-1, he 
has a 65% chance of having airflow limitation. 

CASE 2 In case 2, the patient was asymptomatic during the 
office visit. She had never smoked cigarettes and had no 
exposure to environmental or occupational pollutants. She 
did not have a barrel chest, her apical impulse was normally 
located, and her chest was not hyperresonant. Her breath 
sounds were normal in intensity, without wheezing or rhon- 
chi. Her forced expiratory time was 2 seconds. Because of 
clinical examination findings, you conclude that she does not 
have airflow limitation at the office visit. However, because of 
her medical history, you suspect that she has intermittent air¬ 
flow limitation secondary to environmental allergens. 

CASE 3 In case 3, the patient had no smoking history or 
previous episodes of dyspnea. His chest was hyperresonant, 
and he had diffuse expiratory wheezes. His forced expira¬ 
tory time was 12 seconds, and he had pulsus paradoxus of 
32 mm Hg. You diagnose acute bronchospasm and begin 
appropriate bronchodilator and glucocorticoid therapy. 

THE BOTTOM LINE 

Guidelines for using the clinical examination to diagnose air¬ 
flow limitation are as follows: 

• No single item or combination of items from the clinical 
examination rules out airflow limitation. However, the 


best finding associated with decreased likelihood of airflow 
limitation is a history of never having smoked cigarettes 
(especially in patients without a history of wheezing and 
without wheezing on examination). 

• The best findings associated with increased likelihood of 
airflow limitation are objective wheezing, barrel chest, pos¬ 
itive match test result, rhonchi, hyperresonance, forced 
expiratory time greater than 9 seconds, and subxiphoid 
apical impulse. 

• A finding of a barrel chest (in children) or wheezing virtu¬ 
ally rules in airflow limitation. 

• Any 2 of the following virtually rule in airflow limitation: 
70 pack-years or more of smoking, decreased breath 
sounds, or history of chronic obstructive pulmonary dis¬ 
order. 

• Three findings predict the likelihood of airflow limitation 
in men (Figure 13-1): years of cigarette smoking, subjec¬ 
tive wheezing, and either objective wheezing or peak expi¬ 
ratory flow rate. 
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CLINICAL SCENARIO 


A 51-year-old business executive comes to the clinic for a 
health checkup. He has no specific complaints, other than 
those he attributes to the vagaries of reaching middle age. 
You know the patient well and thus are aware that he 
smokes cigarettes. He has been smoking one-half to 1 pack 
a day since college. His neck and chest configuration are 
normal. There is no dyspnea, and he has never com¬ 
plained of either shortness of breath or cough other than 
during flulike illnesses. As part of his examination, you lis¬ 
ten to his chest and hear no wheezes. You do not think he 
has airflow limitation, but is it time for spirometry testing, 
given his smoking history? 

UPDATED SUMMARY ON 
OBSTRUCTIVE AIRWAYS DISEASE 

Original Review 

Holleman DR Jr, Simel DL. Does the clinical examination 
predict airflow limitation? JAMA. 1995;273(4):313-319. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for the 
Rational Clinical Examination series, combined with the sub¬ 
ject headings “lung diseases,” “obstructive/di,” “pulmonary 
disease,” “chronic obstructive/di,” or “airway obstruction/di” 
published in English from 1994 to August, 2004. The results 
yielded 131 titles for which we reviewed the abstracts. As in 
the original Rational Clinical Examination article, we focused 
on studies that assessed clinical findings in a population of 
nonemergency primary care patients with irreversible airflow 
limitation, rather than the acutely dyspneic patient. The 
abstracts were reviewed to identify studies that might allow 
us to assess the sensitivity and specificity of patient symp¬ 
toms or signs. We found 18 original articles for further 
review. We retained articles (n = 5) that included a popula¬ 
tion of patients without a previous diagnosis of obstructive 
airways disease who had their disease status verified by 
spirometry after a clinical evaluation. 


We also crossed the clinical subject headings with “meta¬ 
analysis,” “ROC curve,” and the text word “systematic review” 
in both MEDLINE and the Cochrane databases. We retrieved 
articles referenced in Table 13-2 of The Rational Clinical 
Examination article for assessment of quality, along with 
examining files that we retained from the original Rational 
Clinical Examination article. These additional searches led us 
to 4 articles, 3 of which gave us insight into estimates for the 
prior probability of disease. 


NEW FINDINGS 

• The single best Ending for identifying adults with obstructive 
airways disease is a history of > 40 pack-years of smoking 

• Findings for obstructive airways disease in combination are 
much better than individual symptoms or signs 

Details of the Update 

On reviewing the initial Rational Clinical Examination article, 
we realized that we did not consider the potential effect of 
studies that included patients with a known diagnosis of 
obstructive airways disease. When obstructive airways disease 
is known and treated, some signs might improve (eg, wheez¬ 
ing, bedside measures of airflow such as forced expiratory time 
or peak flow), whereas other findings might not be affected by 
treatment (eg, maximum laryngeal height). It is also possible 
that including patients with a known diagnosis of obstructive 
airways disease minimizes the independent importance of the 
risk factors that led to the diagnosis (eg, smoking). For gener- 
alizability, the most promising studies should either analyze 
patients separately for those with a known diagnosis of 
obstructive airways disease or enroll a population that is inde¬ 
pendent of whether a previous diagnosis was made so that the 
examining clinicians would have no information about previ¬ 
ous diagnosis and would be examining a variety of patients 
with and without obstructive airways disease. Therefore, we 
reassessed the studies that reported on combinations of find¬ 
ings from the original Rational Clinical Examination article. 

One promising study did an excellent job of assessing interob¬ 
server variability. 1 The study reported good precision for the 
presence of wheezing (k = 0.69) and for reduced breath sounds 
(k = 0.47). However, we had included the univariate and multi- 
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Table 13-5 Univariate Findings for Obstructive Airways Disease in 
Patients With No Prior Obstructive Airways Disease Diagnosis 

Finding (No. of 

Combined Studies) LR+ (95% Cl) LR- (95% Cl) 

Wheezing (n = 5 a ' b P 4,4 (1.6-12) 0.88 (0.84-0.92) 

Maximum laryngeal height < 4 cm 4 4.2 (2.3-7.9) 0.7 (0.5-0.9) 

Decreased breath sounds (n = 2) 3,6 2.6 (1.9-3.6) 0.66 (0.49-0.69) 

Forced Expiratory Time, s 5 
>9 6.7(2.1-21) 

6-9 1.8(0.77-4.0) 

<6 0.6 (0.5-0.8) 

Forced Expiratory Time Adjusted for Age, s 9 
>6 And patient >60 y 3.4 (2.2-5.2) 

>6 And patient <60 y 2.1 (1.3-3.5) 

<6 And patient >60 y 0.33 (0.23-0.47) 

<6 And patient <60 y 0.57 (0.34-0.95) 

Smoking Status, Pack-Years 34 
>40 12(2.7-50) 

20-40 0.8 (0.4-1.6) 

<20 0.5 (0.3-0.9) 

Overall Clinical Prediction of Disease (n = 2) 110c 
Moderate-severe disease 5.6 (3.1-10) 

Mild disease 2.3 (0.55-9.7) 

No disease 0.59 (0.51 -0.68) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“For data from Straus et al, 4 we used only the data for patients without obstructive air¬ 
ways disease. 

“Auscultated. 

“Overall clinical prediction of disease (or gestalt) is listed with univariate measures here 
because it is a single assessment by the clinician, without an explicit list of criteria, as 
opposed to the multivariate methods in Table 13-6 that use the specified variables. 

variate data for our accuracy estimates in the original Rational 
Clinical Examination article, even though the study had only 15 
patients with obstructive airways disease (of a total of 92 
patients). The data were pooled from each physician, resulting 
in a reported sample size of approximately 340 observations. 
With only 15 affected patients, our confidence in the sensitivity 
from this study should have been much less, and the multivari¬ 
ate model may be overfit to the small number of affected 
patients. 

IMPROVEMENTS IN THE DATA 
PRESENTED IN THE ORIGINAL PUBLICATION 

Because of our uncertainty in combining the results from a 
study with only 15 affected patients, 1 we updated Table 13-3 
from the original article. We added new information, updat¬ 
ing the meta-analysis for auscultated wheezing 2-6 and 
decreased breath sounds. 2,4 In addition, we updated the 
results for forced expiratory time because our initial report 
combined data from a study with a univariate likelihood 
ratio (LR) for forced expiratory time with the results of 
forced expiratory time adjusted for age. 


CHANGES IN THE REFERENCE STANDARD 

The diagnosis of obstructive airways disease depends on a 
spirometric reference standard. Spirometry has 2 functions: 
it confirms airflow limitation and it confirms the lack of 
reversibility with bronchodilators. Patients with reversible 
airflow limitation have an asthmatic component to their air¬ 
ways disease. It is important that the spirometry be per¬ 
formed as soon as possible after the physical examination. 

The spirometric criteria for obstructive airways require a 
forced expiratory volume in 1 second (FEVjj/forced vital 
capacity (FVC) ratio of 0.70 or less, combined with a post- 
bronchodilator FEA^ less than 80% of the patient's predicted 
value. 7,8 Using different criteria results in a different preva¬ 
lence of disease. 7 Although specialty groups are coming to 
consensus and understanding of how these spirometric defi¬ 
nitions differ, some authors of original articles have evalu¬ 
ated the effect of differing reference standards on the findings 
from the medical history and physical examination. Fortu¬ 
nately, there is no clinically meaningful difference on the LRs 
when different spirometric reference standards are used. 4 As 
long as your patients are similar to those in the studies 
reviewed and your pulmonary laboratory uses one of the 
above definitions, the results of this literature review for the 
clinical examination would apply to your patients. 


RESULTS OF LITERATURE REVIEW 

The most diagnostically useful single finding was not an item 
from the physical examination but from the patient medical his¬ 
tory—a finding that the patient smoked more than 40 pack- 
years (see ble 13- ). In the absence of appropriate analyses to 
assess the independence of findings, the best diagnostic strategy 
for diagnosing chronic airways obstruction is using the single 
most diagnostically useful positive LR or 1/negative LR. 11 

Two studies allow us to assess the overall clinical impression 
according to whether the clinicians thought that the patient 
had moderate to severe, mild, or no disease (see Table 13-5). 
These results are the clinician’s gestalt, assessing how he or she 
integrates all the available information. Unfortunately, clini¬ 
cians do not accurately identify patients with mild disease (LR, 
2.3; 95% confidence interval [Cl], 0.55-9.7). The results for the 
clinical gestalt are not much better than would be obtained 
from using only the smoking history or using the information 
from any single physical examination finding. Given the diag¬ 
nostic difficulty, it is appropriate to determine whether a more 
formal weighting of the data in a statistical (rather than intui¬ 
tive) model can improve performance. 

These data show the important influence of analyzing com¬ 
binations of findings (see Table 13-6). Although the univari¬ 
ate data are mostly unimpressive, combinations of just a few 
findings greatly improve the diagnostic efficiency. Most clini¬ 
cians will want the most parsimonious model. By parsimoni¬ 
ous, we mean the model that has the smallest number of 
variables while yielding the best accuracy. The first and fourth 
models in Table 13-6 have the highest diagnostic odds ratios. 
What becomes readily apparent is that whereas the univariate 
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models have almost no ability to decrease the odds of obstruc¬ 
tive airways disease from baseline, these 2 models appear 
quite efficient and may be able to rule out the disease. 

EVIDENCE FROM GUIDELINES 

Whereas all guidelines advocate for counseling patients to 
stop smoking, neither the US Preventive Health Services Task 
Force nor Canadian Task Force for Preventive Health Care 
evaluated the evidence for screening strategies for obstructive 
airways disease. The Global Initiative for Chronic Obstruc¬ 
tive Lung Disease, sponsored by the National Heart, Lung, 
and Blood Institute, together with the World Health Organi¬ 
zation, concluded that the benefits were unknown for a strat¬ 
egy of screening either the general population or the smaller 
population of smokers. 12 


CLINICAL SCENARIO—RESOLUTION 


The clinical findings do not help this patient. Smoking as 
a single risk factor is not particularly helpful, although the 
quantity smoked in either pack-years or years of tobacco 
use is valuable information. This patient does not yet 
exceed the threshold that produces the highest likelihood 
ratios (>40 pack-years of smoking or having smoked for 
more than 55 years). The absence of wheezing and the 
finding of a normal neck do not appreciably lower the 
odds of obstructive airways disease. None of these find¬ 
ings moves us much from the baseline risk of 10% for an 
adult man. 

What about combinations of findings? The results of 
the multivariate models give us the posterior odds of dis¬ 
ease after applying the adjusted likelihood ratios. You can 
use the model with the variables self-reported obstructive 
airways disease, smoked more than 40 pack-years, aged 45 
years or older, and maximum laryngeal height of 4 cm or 
less. His data yield posterior odds of 0.5 x 0.8 x 1.3 x 0.16, 
or 0.08. You have decreased the probability from the 10% 
baseline risk to about 7.7%. Because your clinical intu¬ 
ition is that the probability is higher than his baseline risk, 
you decide to check a different clinical model. 

The other recommended model uses smoking status in 
terms of the number of years of use, symptoms of wheez¬ 
ing, and auscultated wheezing. His data yield posterior 
odds for the combined variables of 3.5 x 0.26 x 0.25, or 
0.23, which increases the probability from 10% to approx¬ 
imately 18%. With continued smoking, his likelihood of 
obstructive airways disease will increase more, whether or 
not he develops symptoms or signs. 


The prediction models suggest that his risk of obstructive 
airways disease is about at the baseline risk to a little higher. 
On the other hand, you have a sense that he is going to 
develop the disease, and with a few more pack-years of 
smoking, the results of the 2 models increase precipitously 
and converge at approximately 40% to 45%, even in the 
absence of signs or symptoms. You might choose to get 
spirometry to prognosticate or to use the results as a moti¬ 
vational strategy to get him to stop smoking. 


Table 13-6 Multivariate Findings for Obstructive Airways Disease 


Model 

Odds Used in 
Combination, 
Derived From a 
Model (Factor 
Present) 

Odds Used in 
Combination, 
Derived From a 
Model (Factor 
Absent) 

Combination of Patients With Known and Unknown 0AD 4 

Smoked > 40 pack-years 

8.3 

0.8 

Self reported history of 0AD 

7.3 

0.5 

Maximum laryngeal height < 4 cm 

2.8 

0.16 

Age > 45 y 

1.3 

0.4 

Posterior odds, all 4 findings present 

220 


Posterior odds, all 4 findings absent 


0.8 

Combination of Patients With Known and Unknown OAD 5 

Forced expiratory time > 9 s 

4.6 

0.8 

Self-reported history of OAD 

4.4 

0.5 

Wheezing 

2.9 

0.8 

Posterior odds, all 3 findings present 

59 


Posterior odds, all 3 findings absent 


0.32 

Patients Without Known OAD 4 

Smoked > 40 pack-years 

12 

0.9 

Maximum laryngeal height < 4 cm 

3.6 

0.7 

Age > 45 y 

1.4 

0.5 

Posterior odds, all 3 findings present 

58 


Posterior odds, all 3 findings absent 


0.32 

Patients Selected Without Consideration of OAD 10 

Smoked > 55 y 

10 


Smoked 30-55 y 

3.5 


Smoked < 30 y 

0.23 


Auscultated wheezing 

4.1 

0.25 

Self-reported wheezing 

3.8 

0.26 

Posterior odds, all 3 findings 
present (smoked > 55 y) 

156 


Posterior odds, all 3 findings ab¬ 
sent (smoked < 30 y) 


0.02 


Abbreviation: OAD, obstructive airways disease. 
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OBSTRUCTIVE AIRWAYS DISEASE—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

A systematic review identified 32 sources of information 
from studies done worldwide on the prevalence of 
obstructive airways disease. 13 Nine of the 32 studies used 
a spirometric reference standard, similar to what is advo¬ 
cated for clinical practice; 8 of these had data that 
allowed us to compare the overall prevalence and sex- 
specific prevalence. The summary overall prevalence was 
7.1% (95% Cl, 5.2%-9.3%). Men (11%; 95% Cl, 8.5%- 
14%) had about twice the rate as women (6%; 95% Cl, 
3%-10%) (see Table 13-7). 


Table 13-7 Prior Probability of Obstructive Airways Disease 

Differs by Sex 

Prior Probability, % 

Men 

11 

Women 

6 


POPULATION FOR WHOM OBSTRUCTIVE AIRWAYS 
DISEASE SHOULD BE CONSIDERED 

All adults, especially those who smoke and are aged 45 
years or older. 


DETECTING THE LIKELIHOOD OF 
OBSTRUCTIVE AIRWAYS DISEASE 

See Table 13-8. 


Table 13-8 Likelihood Ratios for Best Single Findings and for 

Multivariate Models 

Single Best Findings That Are 
the Easiest to Measure 

Likelihood Ratio 

Smoking status, > 40 pack-years 

12 

Auscultated wheezing or laryngeal height < 4 cm 

=4 

To “Rule In” Obstructive Disease, 

Must Use a Multivariate Model 3 

Posterior Odds of 
Disease, Probability (%) 

Smoking > 55 y and wheezing symptoms and 
auscultated wheezing 

156(99) 

History of OAD and smoking > 40 pack-years and 
age > 45 y and laryngeal height < 4 cm 

220 (99) 

To “Rule Out” Obstructive Disease, 

Must Use a Multivariate Model 3 

Posterior Odds of 
Disease, Probability (%) 

Smoking < 30 y and no wheezing symptoms and no 
auscultated wheezing 

0.02 (1.5) 

No history of OAD and smoking < 40 pack-years and 
age < 45 y and laryngeal height > 4 cm 

0.03 (3) 


Abbreviation: OAD, obstructive airways disease. 

“See multivariate table (Table 13-6) for other combinations of findings. 


REFERENCE STANDARD TESTS 

Spirometry using the pulmonary laboratories definition for the 
presence of obstructive airways disease. 
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EVIDENCE TO SUPPORT THE UPDATE: 
Chronic Obstructive Airways Disease 



TITLE Office Spirometry Significantly Improves Early 
Detection of Chronic Obstructive Pulmonary Disease in 
General Practice: The DIDASCO Study. 

AUTHORS Buffels J, Degryse J, Hayrman J, Decramer M. 

CITATION Chest. 2004; 125(4): 1394-1399. 

QUESTION Does a simple patient questionnaire iden¬ 
tify patients with obstructive airways disease? 

DESIGN Consecutive patients, prospective. 

SETTING Twenty general practitioners in Belgium. 

PATIENTS The patients were aged 35 to 70 years and 
visiting their general practitioner during a 12-week 
period. Subjects using bronchodilators or inhaled steroids 
were excluded. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

See e 13-9. All patients with at least 1 positive answer 
on the questionnaire were considered to have a positive 
result and underwent spirometry within 1 week. A ran¬ 
dom sample of individuals with no complaints (10%) 
underwent spirometry. The data were appropriately 
adjusted for this planned verification bias. It is not clear 
whether the spirometry and its interpretation was blinded 
to the questionnaire results. 


Table 13-9 Simple Questions for Obstructive Airways Disease 

Do you have any of the following complaints? 

1. Cough lasting for at least 2 weeks 

2. Breathing difficulties during mild exercise or at night 

3. Wheezing 

4. Any kind of nasal allergy or hay fever 

5. Have you had 1 or more of these complaints during the past year 

6. Have you ever had to visit your physician for a wheezing or long-lasting 
cough 


MAIN OUTCOME MEASURES 

Obstructive airways disease was identified by spirometry. 
Patients with newly found airways obstruction then under¬ 
went assessment for reversibility with a bronchodilator. 
Those with reversible findings were considered to have 
asthma rather than obstructive airways disease. All general 
practitioners used the same model of spirometer, underwent 
training for its use, and had their precision assessed. The dif¬ 
ference between the pulmonary function laboratory- and 
office-based tests was only 2.2%. 

MAIN RESULTS 

See Table 13-10. One hundred thirty-five patients had newly 
diagnosed obstructive airways disease. After adjusting for verifi¬ 
cation bias, the number of new diagnoses by spirometry was 
extrapolated to 216 of 2923. Of the patients with disease, less than 
10% had moderate to severe or worse obstructive airways disease. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS This is one of the few studies for any clinical 
examination finding that recognized verification bias, 
planned for it appropriately, and adjusted for it appropriately 
in the analysis. 

LIMITATIONS The general practitioners knew these patients 
and may likely have known their airways disease and smoking 
status before conducting the spirometry tests. Nonetheless, 
the results of the office-based data showed excellent precision 


Table 13-10 Likelihood Ratios for at Least 1 Positive Answer on the 
Questionnaire 

Test Sensitivity Specificity LR+ (95% Cl) LR— (95% Cl) 

Positive question- 0.58 0.79 2.7 (2.4-3.1) 0.53(0.45-0.61) 

naire (at least 1 
positive answer) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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compared with those of a pulmonary laboratory. Further¬ 
more, the practitioners had to show competence in use of the 
instrument. 

The investigators intentionally did not include a question 
about pack-years of smoking, because they believed that the 
clinical evidence indicated that all smokers older than 45 
years should have spirometry testing. In the discussion, the 
authors observed that the diagnoses were all new, even 
though these were patients followed in their practice. 

A simple symptom-based clinical model, without eliciting 
patient risk factors, was not particularly useful for patients 
with predominantly unrecognized, mild obstructive airways 
disease. That finding is not surprising but is useful. The accu¬ 
racy of the multivariate model that used the answers to all the 
questions is not provided. 

Reviewed by David L. Simei, MD, MHS 


TITLE Quantitative Assessments From the Clinical 
Examination: How Should Clinicians Integrate the 
Numerous Results? 

AUTHORS Holleman DR, Simei DL. 

CITATION J Gen Intern Med. 1997;12(3):165-171. 

QUESTION Do the individual medical history and 
physical examination findings provide independently use¬ 
ful information? 

DESIGN Analysis of data originally published and 
included in the original Rational Clinical Examination 
article on obstructive airways disease. 1 


Table 13-11 Likelihood Ratios That Are Independently Useful for 
Diagnosis of Obstructive Airways Disease 

Model Variable 

LR When Finding 

Is Present 

LR When Finding 

Is Absent 

Smoke > 55 y 

10 


Smoke 30-55 y 

3.5 


Smoke < 30 pack-years 

0.23 


Auscultated wheezing 

4.1 

0.25 

Self-reported wheezing 

3.8 

0.26 


Abbreviation: LR, likelihood ratio. 


approaches were statistically similar. The third and fourth 
approaches that use a multivariate model to reduce the 
number of variables were the optimal approaches because 
they reduced the amount of data that the physician had to 
collect while still having a higher accuracy from the receiver 
operating characteristic curve. 

CONCLUSIONS 

In this reanalysis of the data originally included in The Ratio¬ 
nal Clinical Examination article, we get a better understand¬ 
ing of the concept of statistical independence as it applies to 
likelihood ratios. The patient report of previous wheezing 
was as useful as, and independent of, the information 
obtained by auscultated wheezing. 

Reviewed by David L. Simei, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Holleman DR, Simei DL, Goldberg JS. Diagnosis of obstructive airways 
disease from the clinical examination. / Gen Intern Med. 1993;8(2):63-68. 


MAIN OUTCOME MEASURES 

The main outcome measure was the area under the curve (a 
measure of accuracy) for 4 approaches to using a list of 
symptoms and signs for obstructive airways disease. The 
approaches were (a) use all the likelihood ratios for the indi¬ 
vidual findings and multiply them serially without any 
regard to independence; (b) pick the single best likelihood 
ratio applicable to the patient; (c) identify the most impor¬ 
tant variables from a logistic model and then use the unad¬ 
justed, raw likelihood ratios from method (a); or (d) use the 
adjusted likelihood ratios for the variables found significant 
in a multivariate model. 


MAIN RESULTS 

See ble 13-11. Using all the information but ignoring 
independence was the worst approach and required the 
most amount of clinician time to collect the data. The other 
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TITLE Improving Pulmonary Auscultation as a Tool in 
the Diagnosis of Bronchial Obstruction—Results of an 
Educational Intervention. 

AUTHORS Melbye H, Aaraas I, Hana J, Hensrud A. 

CITATION Scand J Prim Health Care. 1998;16(3): 160-164. 

QUESTION Does an educational intervention consisting 
of audiovisual review of lung sounds and a didactic review 
emphasizing diminished breath sounds, crackles, and 
wheezes over the usefulness of rhonchi improve the predic¬ 
tion of the forced expiratory volume in 1 second (FEVj) 
percentage? 

DESIGN Before-after study of an educational intervention. 

SETTING Five primary care practices in Norway. 

PATIENTS Convenience sample of general practice 
patients with a 1:3 ratio of patients with vs without pul¬ 
monary symptoms. There were 354 patients enrolled in 
the phase before the intervention and 343 patients in the 
second phase after the intervention. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The physicians recorded physical findings on a prespecified list 
and then estimated the FEVj (<60%, 60%-79%, or >80% pre¬ 
dicted). The diagnostic standard spirometry test was performed 
after the clinical examination, blinded to the clinical results. 

MAIN OUTCOME MEASURE 

Accuracy of FEVj percentage prediction. 

MAIN RESULTS 

See Table 13-12. Before the educational intervention, the 
accuracy of predicting the correct FEV j percentage range was 
0.68 (SE, 0.04). After the educational intervention, the accu¬ 
racy was 0.71 (SE, 0.04). These accuracy outcomes were 
derived from the area under the receiver operating character¬ 
istic curve, using data in the article. There was no statistical 
difference in the accuracy before vs after the intervention (P 
= .21). Therefore, we combined the data before the educa¬ 
tional intervention with the data after the intervention, dis¬ 
playing the ability to predict the presence of disease using a 
cut point of FEV l less than 80% predicted. 


Table 13-12 Serial Likelihood Ratios for the FEV! to the Diagnosis of 
Obstructive Airways Disease 


Clinical Prediction of FEtA Percentage, % 

LR for 0AD 

<60 predicted 

11 (3.9-32) 

60-79 predicted 

6.5 (4.2-9.9) 

>80 

0.61 (0.53-0.7) 


Abbreviations: FEV-|, forced expiratory volume at 1 second; LR, likelihood ratio; 0AD, 
obstructive airways disease. 


LIMITATIONS Practitioners likely knew their affected 
patients. The cut point for the FEV! in this study was that of 
the British Thoracic Society, which is slightly higher than 
other recommendations. 

These practitioners were good at identifying the patients 
with more significant disease. However, they were not as 
good at ruling out disease. The educational intervention that 
emphasized the physical findings noted above had little effect 
on these general practitioners. The conclusions are that the 
clinical findings were not useful, the clinical findings are use¬ 
ful but the educational intervention was not effective, or 
these providers already knew the patients who had obstruc¬ 
tive airways disease and those who did not. Overall, these cli¬ 
nicians were already good diagnosticians in being able to 
identify those with disease (see likelihood ratio [LR] for 
those they predicted would have obstructive airways disease). 
However, they were not as good at identifying those patients 
with normal results because the LR when they predicted nor¬ 
mality was only 0.61. These data are consistent with the data 
for the physical examination components that show that 
individual findings do not rule out obstructive airways dis¬ 
ease. They are also consistent with the overall assessment 
reported in another study for moderate to severe disease (LR, 
4.2), mild disease (LR, 0.82), and no disease (LR, 0.42), in 
which the clinicians did not know the patients. 1 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Holleman DR, Simel DL, Goldberg JS. Diagnosis of obstructive airways 
disease from the clinical examination. / Gen Intern Med. 1993;8(2):63- 
68 . 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 randomized trial of an edu¬ 
cational intervention. 

STRENGTHS Independent assessment of the outcome. 
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TITLE Paradoxical Movement of the Lateral Rib Margin 
(Hoover Sign) for Detecting Obstructive Airway Disease. 

AUTHOR Garcia-Pachon E. 

CITATION Chest. 2002;122(2):651-655. 

QUESTION Do the Hoover sign and other clinical mea¬ 
sures of obstructive airways disease have good reliability 
and accuracy? 

DESIGN Prospective, independent comparison with 
spirometry. 

SETTING Pulmonary consultation clinic. 

PATIENTS Consecutive patients (n = 172) referred to a 
pulmonary clinic who were older than 40 years and smok¬ 
ers more than 20 pack-years, had a previous diagnosis or 
self-report of obstructive pulmonary disease, were receiv¬ 
ing inhalants for more than 6 months, or had dyspnea. 
Thus, all patients were those in whom a diagnosis of 
obstructive airways disease could be considered. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The examiners were a pulmonologist and a first-year resi¬ 
dent. The examinations were done independently. Each resi¬ 
dent was specifically trained for 1 week before the patients 
were enrolled. A blinded technician performed spirometry 
tests independently. Standard definitions of obstructive air¬ 
ways disease were used. 

In patients without disease, the lower thoracic rib cage 
moves upward and outward with inspiration. Hoover sign 
refers to a paradoxic indrawing of the lateral ribs with inspi¬ 
ration, attributed to a fixed and flattened diaphragm. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, likelihood ratio (LR), and K statistics. 

MAIN RESULTS 

See . Of the 172 patients, 64 (37%) met spiro- 

metric criteria for obstructive airways disease. 

The results for the positive likelihood ratio (LR+) and neg¬ 
ative LR were statistically similar for all findings. We report 
these as meta-analytically combined results, according to the 
results given. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Comparison between a trainee and staff pul¬ 
monologist. 


Table 13-13 Agreement (k) and Likelihood Ratios for Findings of 
Obstuctive Airways Disease 


Test 

k (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 

Hoover sign 

0.74 (0.63-0.86) 

4.6 (3.1-6.9) 

0.50(0.4-0.61) 

Rhonchi 

0.38(0.13-0.64) 

3.0 (1.6-5.7) 

0.87 (0.78-0.98) 

Reduced breath 
sounds 

0.51 (0.37-0.65) 

2.7 (2.0-3.7) 

0.54 (0.43-0.68) 

Wheeze 

0.67(0.51-0.84) 

1.3 (0.78-2.3) 

0.95(0.83-1.1) 

Overall clinical 
impression 

0.61 (0.49-0.73) 

3.6 (2.7-47) 

0.25(0.17-0.37) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


LIMITATIONS The study was done in a pulmonary clinic 
from new consultations, although the frequency of obstruc¬ 
tive airways disease is comparable to that of previous studies. 
The specific training (especially for the Hoover sign) of the 
resident likely improved the reliability, which strengthens the 
study even though the results may not generalize to individu¬ 
als without similar training. 

The results for the reliability of wheezing and reduced 
breath sounds are almost identical to that found in a larger 
study of reliability. 1 It is reassuring that a first-year resident 
with only 1 week of specific training on the pulmonary 
examination can develop an overall clinical impression that 
agrees with that of their pulmonology instructor. 

The setting for this study (a pulmonology clinic) was dif¬ 
ferent from that of other studies. However, the patients were 
all referral patients and unknown to the examiners before the 
study. The prevalence of disease in this population was simi¬ 
lar to that of other studies of the physical examination. The 
prevalence of moderate to severe disease (22% of all patients 
and 59% of those with obstructive airways disease) was simi¬ 
lar to that of a previous study done in a pulmonary 
laboratory 2 but much lower than that of a study done that 
recruited patients with similar qualifying characteristics 
independently of a referral. 1 

At least among a group of patients referred to a pulmonary 
clinic, the overall clinical impression was more efficient 
(highest diagnostic odds ratio at 14) than any individual 
finding. Holleman et al 3 found a similar LR+ for detecting 
moderate to severe disease (LR, 4.2; 95% confidence interval 
[Cl], 2.2-8), but their ability to rule out disease in a less 
severely affected population not referred to a pulmonary 
clinic was not as efficient (LR for the clinical impression of 
normality was 0.42; 95% Cl, 0.25-0.70). The ability of the cli¬ 
nicians in the multinational study to come up with a useful 
overall clinical impression was poor. 

Hoover sign needs to be confirmed with different examiners 
and in different populations to make sure that it is reproduc¬ 
ible and does not vary with disease prevalence. The results 
reported here for wheezing can be combined with other stud¬ 
ies, although this study reports statistically worse LRs. 

Reviewed by David L. Simel, MD, MHS 
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CHAPTER 13 Chronic Obstructive Airways Disease 
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TITLE The Accuracy of Patient History, Wheezing, and 
Laryngeal Measurements in Diagnosing Obstructive Air¬ 
way Disease. 

AUTHORS Straus SE, McAlister FA, Sackett DL, Deeks 
JJ, for the CARE-COAD1 Group. 

CITATION JAMA. 2000;283(14):1853-1857. 

QUESTION How do we determine the accuracy of a 
variety of signs for obstructive airways disease? 

DESIGN Prospective. 

SETTING Multinational, primary care. 

PATIENTS Consecutive patients from each site, sorted 
into one of 3 groups: known chronic obstructive airways 
disease (n = 76 [25%]), suspected chronic airways disease 
(n = 114 [37%]), neither known nor suspected chronic 
airways disease (n = 119 [39%]). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The patients were assessed clinically with the physical exami¬ 
nation findings of laryngeal height, laryngeal descent, and 
wheezing. Videotaped instructions for assessing laryngeal 
height and descent were shared with each investigator 
(http://www.carestudy.com/CareStudy/COAD3/Intro.asp, 
accessed April 20, 2008). The clinical examination findings 
were recorded independently and blinded to the spirometry 
results. Likewise, the spirometrists were blinded to the clini¬ 
cal assessment. 


MAIN OUTCOME MEASURES 

Univariate and multivariate likelihood ratios. Obstructive 
airways disease with analyses analyzed to determine the 
effect of different case definitions of obstructive airways 
disease. A multivariate analysis assessed combinations of 
variables. 


MAIN RESULTS 

See Tables 13-14 and 1 


Table 13-14 Likelihood Ratios of Univariate Findings for Patients 
Without Known Obstructive Airways Disease 


LR+ (95% Cl) 

LR- (95% Cl) 

Smoking Status, Pack-Years 

>40 

12(2.7-50) 


20-40 

0.8 (0.4-1.6) 


<20 

0.5 (0.3-0.9) 


Age, y 

>65 

1.9(1.3-2.8) 


45-64 

1.5 (1.1-2.2) 


<45 

0.3 (0.2-0.5) 


Finding 

Wheezing 

2.1 (1.2-3.5) 

0.9 (0.7-1.0) 

Maximum laryngeal height < 4 cm 4.2 (2.3-7.9) 

0.7 (0.5-0.9) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 13-15 Likelihood Ratios for Multivariate Findings for All Patients 

vs Those Without Known Obstructive Airways Disease 3 



LR+ 

LR- 

All Patients 

Self-reported OAD 

7.3 

0.5 

Smoked > 40 pack-years 

8.3 

0.8 

Age > 45 y 

Maximum laryngeal height < 4 cm 

2.8 

0.8 

Patients Without Known OAD 

Smoked > 40 pack-years 

12 

0.9 

Age > 45 y 

1.4 

0.5 

Maximum laryngeal height < 4 cm 

3.6 

0.7 

All 3 factors present vs none present 

58.5 

0.32 


Abbreviations: LR+, positive likelihood ratio; LR—, negative likelihood ratio; OAD, 
obstructive pulmonary disease. 

“For the multivariate models, the LRs appropriate to an individual patient's results 
can be multiplied to determine the LR specific to that patient. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Assessment of different case definitions for 
obstructive airways disease. The results for patients with¬ 
out a prior OAD diagnosis can be compared to the entire 
population. 

LIMITATIONS None. 

As in other studies, this high-quality study showed that 
smoking status dominates the clinical symptoms and signs. 
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CHAPTER 13 Evidence to Support the Update 


According to a receiver operating characteristic analysis of all 
patients, they chose a cut point of greater than 40 pack-years. 
The results also show that no single finding can be used to 
rule out obstructive airways. Because of the lack of ability of 
single findings to prove normality, the investigators appro¬ 
priately examined combinations of findings. Before assessing 
that, they determined that using a variety of accepted spiro- 
metric definitions for obstructive airways disease did not 
alter the univariate likelihood ratios in a clinically meaning¬ 
ful way. 

The assessment of the threshold value for smoking status 
was based on all patients. This brings up an important point 
not just about smoking status but also about the physical 
findings. In practice, including patients with known obstruc¬ 
tive airways disease in a study of the clinical examination 
may not give the results that clinicians need; once you know 
the patient has obstructive airways disease, the physical find¬ 
ings no longer matter for diagnosis. Fortunately, the investi¬ 
gators include a separate analysis for patients without known 
obstructive airways disease. 

Including patients with known obstructive airways disease 
affects the results for sensitivity in various ways. In general, 
including more severely ill patients (or those with disease 
that is more obvious) would be expected to inflate the sensi¬ 
tivity and make the negative likelihood ratio appear optimis¬ 
tically low. However, this may not always be the case. For 
example, wheezing might be one finding that physicians 
“treat” when they know their patients have obstructive air¬ 
ways disease. Thus, patients with known obstructive airways 
disease who are under treatment might proportionately 
wheeze less than untreated, affected patients. The effect on 
sensitivity from including vs excluding such patients will cre¬ 
ate variability in outcomes as a function of the relative pro¬ 
portion of such patients and the pattern of their disease 
severity. Similarly, a finding such as abnormal laryngeal 
height might be fixed and not appear until more severe dis¬ 
ease is present and not change with treatment. In this study, 
the point estimate for maximum laryngeal height (<4 vs >4 
cm) appeared better than wheezing, although the confidence 
intervals overlapped. It is likely that future studies will show 
that either wheezing or laryngeal height is useful, but not 
both. Because of the overlap in their LRs, some multivariate 
models could have wheezing, whereas others might have 
laryngeal height, depending on the spectrum of disease. The 
multivariate models comparing a population of patients in 
which 25% have known obstructive airways disease, vs those 
in whom the status is unknown, show that the ability to rule 
in or rule out disease is not as efficient because information 
is lost. The model for patients with disease status unknown 
seems to be the most relevant for clinicians who are trying to 
establish a diagnosis. 

Reviewed by David L. Simel, MD, MHS 


TITLE Accuracy of History, Wheezing, and Forced Expi¬ 
ratory Time in the Diagnosis of Chronic Obstructive Pul¬ 
monary Disease. 

AUTHORS Straus SE, McAlister FA, Sackett DL, Deeks 
IJ, for the CARE-COAD2 Group. 

CITATION JGen Intern Med. 2002;17(9):684. 

QUESTION What is the accuracy of a variety of signs 
for obstructive airways disease? 

DESIGN Prospective. 

SETTING Multinational, primary care. 

PATIENTS Consecutive patients from each site, sorted 
into one of 3 groups: known chronic obstructive airways 
disease (n = 66 [41%]), suspected chronic airways disease 
(n = 43 [27%]), and neither known nor suspected chronic 
airways disease (n = 52 [32%]). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The patients were assessed clinically with the physical exami¬ 
nation findings of wheezing and forced expiratory time. The 
clinical examination findings were recorded independently 
and blinded to the spirometry results. Likewise, the spirome- 
trists were blinded to the clinical assessment. 

MAIN OUTCOME MEASURES 

Univariate and multivariate likelihood ratios (LRs). Obstructive 
airways disease with analyses analyzed to determine the influ¬ 
ence of different case definitions of obstructive airways disease. 
A multivariate analysis assessed combinations of variables. 


MAIN RESULTS 

See Tables 1: L( and 1 


Table 13-16 

Likelihood Ratios for Univariate Findings for All Patients 

Test 

LR+ (95% Cl) 

LR- (95% Cl) 

Wheezing 

4.0 (1.6-9.9) 

0.8 (0.7-0.9) 

Smoking Status, Pack-Years 

>40 

3.3 (1.5-7.1) 


20-40 

2.0 (1.0-4.0) 


<20 

07(0.4-1.2) 


Age, y 

>65 

1.6 (1.1-2.3) 


<65 

0.7 (0.5-0.9) 


Forced Expiratory Time, s 

>9 

6.7(2.1-21) 


6-9 

1 .8 (0.77-4.0) 


<6 

0.6 (0.5-0.8) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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CHAPTER 13 Chronic Obstructive Airways Disease 


Table 13-17 Multivariate Findings for All 

Patients 3 


Test 

LR+ 

LR- 

Forced expiratory time > 9 s 

4.6 

0.8 

Self-reported OAD 

4.4 

0.5 

Wheezing 

2.9 

0.8 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio; OAD, 
obstructive airways disease. 

“For the multivariate model, the LRs appropriate to a patient's results can be multi¬ 
plied to determine the LR specific to that patient. Smoking status was not indepen¬ 
dently significant once these 3 variables were considered. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Assessment of different case definitions for 
obstructive airways disease. 

LIMITATIONS Forty-one percent of the patients had known 
obstructive airways disease. 

The results in this study were similar to those in the first 
study reported by the same group of authors. The authors 
argue that it makes sense to include the patient’s self-report 
of a previous diagnosis in a logistic model. It does make 
sense because the patient’s report may or may not be cor¬ 


rect. However, the results may not generalize as well to 
patients who are unaware of their status simply because 
effective treatment may affect the physical examination 
findings (eg, wheezing). 

The adjusted LR for a previous diagnosis of obstructive air¬ 
ways disease was 4.4, with a negative likelihood ratio (LR-) of 
0.5. Although Holleman et al 1 did not assess patients for a pre¬ 
vious diagnosis of obstructive airways disease, they did collect 
symptoms of chronic bronchitis. The results are consistent in 
that the independent LR for chronic bronchitic symptoms was 
3.8, with an LR- of 0.66 in the Holleman et al study. 1 

In this study, there was a high prevalence of patients with 
known obstructive airways disease. The higher prevalence of 
disease appears to have rendered the additional information 
about smoking status useless when evaluated in combination 
with other findings. Nonetheless, the results here support 
those reported by Holleman et al 1 that forced expiratory time 
adds information to wheezing status. 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Holleman DR, Simel DL, Goldberg JS. Diagnosis of obstructive airways 
disease from the clinical examination. / Gen Intern Med. 1993;8(2):63- 
68 . 
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CHAPTER 


CLINICAL SCENARIOS 


Does This Patient Have 

Clubbing? 

Kathryn A. Myers, MD, EdM, FRCPC 
Donald R. E. Farquhar, MD, SM, FRCPC 


CASE 1 A respiratory therapist asks you to see her 
asymptomatic 76-year-old mother in consultation because 
she is concerned that her mother has clubbing. The 
patient has increased curvature of the nails, and you won¬ 
der whether other physical examination techniques can 
help you decide whether clubbing is present. 

CASE 2 While performing a routine physical examina¬ 
tion on a 65-year-old female smoker with chronic obstruc¬ 
tive pulmonary disease (COPD), you detect changes in the 
fingers suggestive of clubbing. You recall an association 
between clubbing and certain types of pulmonary disease, 
and you wonder whether any further diagnostic evalua¬ 
tion of this patient is warranted. 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


Clubbing is one of those phenomena with which we are all sofa- 
miliar that we appear to know more about it than we really do. 1 

—Samuel West, 1897 

The association of clubbing with a host of infectious, neoplas¬ 
tic, inflammatory, and vascular diseases has captured the imag¬ 
ination of clinicians since Hippocrates first described clubbing 
in a patient with empyema in the fifth century BC. 2 Although 
clubbing can be a benign hereditary condition, the diagnostic 
implications in an adult are such that its detection should 
prompt consideration of the underlying etiology (Table 14-1). 3,4 
In the pediatric population, clubbing usually represents the 
progression of established diseases, such as cystic fibrosis or 
uncorrected cyanotic congenital heart disease. 

Digital clubbing is characterized by the enlargement of the 
terminal segments of the fingers or toes that results from the 
proliferation of the connective tissue between the nail matrix 
and the distal phalanx. Although most often symmetrical, 
clubbing can be unilateral or even unidigital. 5 ’ 6 Clubbing can 
occur in isolation or in association with hypertrophic osteoar¬ 
thropathy. 7,8 Hypertrophic osteoarthropathy, a systemic disor¬ 
der affecting bone and joints, is most commonly associated 
with bronchogenic carcinoma, but it can occur in association 
with extrapulmonary malignancies, as well as nonmalignant 
pulmonary diseases. 9 Pachydermoperiostosis is a rare, congen¬ 
ital form of hypertrophic osteoarthropathy. Congenital club¬ 
bing, which usually has its onset in childhood, may represent a 
limited form of pachydermoperiostosis. 5 

Unlike such physical findings as ascites and splenomegaly, 
the clinical impression of clubbing cannot be verified by sim¬ 
ple imaging tests. Throughout the past century, many inves¬ 
tigators have described possible reference standards for 
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CHAPTER 14 The Rational Clinical Examination 


Table 14-1 Conditions Associated With Acquired Clubbing 

Neoplastic intrathoracic disease 
Bronchogenic carcinoma 
Malignant mesothelioma 
Pleural fibroma 

Metastatic osteogenic sarcoma 
Suppurative intrathoracic disease 
Lung abscess 
Bronchiectasis 
Cystic fibrosis 
Empyema 

Chronic cavitary mycobacterial or fungal infection 
Diffuse pulmonary disease 
Idiopathic pulmonary fibrosis 
Asbestosis 

Pulmonary arteriovenous malformations 
Cardiovascular disease 
Cyanotic congenital heart disease 
Infective endocarditis 
Arterial graft sepsis 3 
Brachial arteriovenous fistula 6 
Hemiplegic stroke 6 
Gastrointestinal disease 
Inflammatory bowel disease 
Celiac disease 
Hepatobiliary disease 
Cirrhosis (particularly biliary and juvenile) 

Metabolic disease 
Thyroid acropachy 

“Associated with clubbing distal to graft sepsis. 

“Associated with unilateral clubbing. 

diagnosis of clubbing, including water displacement of the 
terminal phalanges, measurement of nail curvature using a 
device called an unguisometer, and measuring nail angles 
and ratios with plaster casts or shadow projections of fin¬ 
gers. 10 ' 15 None has been accepted as a criterion standard of 
diagnosis, and all are cumbersome and impractical as a 
method of verifying the clinical impression of clubbing. 
Therefore, physicians must rely solely on their skills in clini¬ 
cal examination to detect clubbing. 

Pathophysiology 

Normally, the nailbed thickness is less than 2.0 mm. Clubbed 
fingers studied at autopsy show not only a thickness greater 
than 2.0 mm but also a lower density of nailbed connective 
tissue. 16 Morphologic findings include the presence of primi¬ 
tive fibroblasts, increased numbers of eosinophils and lym¬ 
phocytes, and increased caliber and number of blood vessels. 
Genetic predisposition, vagally mediated neural mecha¬ 
nisms, and the direct effect of tissue hypoxia or of circulating 
vasodilators that elude metabolism in the lung through 
right-to-left shunting have all been proposed to explain the 


morphology. Although there is experimental and clinical evi¬ 
dence to support each of these hypotheses, it has not been 
possible to formulate a comprehensive theory of pathogene¬ 
sis applicable to all clinical circumstances. 5,1719 

Symptoms 

Clubbing is almost always painless, unless it is associated 
with hypertrophic osteoarthropathy. Symptoms of hyper¬ 
trophic osteoarthropathy include periarticular pain and 
swelling, most often in the wrists, ankles, knees, and elbows. 
Accordingly, the presentation of hypertrophic osteoarthrop¬ 
athy can be confused with such primary rheumatologic dis¬ 
orders as rheumatoid arthritis. 5 Many patients with clubbing 
express unawareness of any abnormality in their fingers. In 
one series of patients with clubbing, only 32 of 116 patients 
were aware of the onset of the changes in their nails, and only 
2 reported painful fingers or joints. 20 

Signs 

Identification of advanced clubbing, which is characterized 
by so-called drumstick fingers, poses little difficulty for cli¬ 
nicians. By contrast, the subtleties of the earlier stages of 
clubbing may lead to animated bedside debate among med¬ 
ical students, residents, and experienced physicians. The 2 
approaches for identifying clubbing on physical examina¬ 
tion are visual inspection and palpation of the cuticle for 
increased sponginess. 16,21,22 

Inspection 

General Appearance 

Inspection of the fingers for clubbing can reveal abnormali¬ 
ties in the nailfold angles and in the shape, depth, and width 
of the terminal phalanges. In addition to the obvious changes 
in the shape of the terminal phalanges in established club¬ 
bing (Figure 14-1A), close inspection of the cuticle may 
reveal a shiny and smooth appearance. Lovibond 23 described 
a lilac hue of the nail fold in clubbing, caused by increased 
vascularity in the connective tissue. Although the increased 
nail curvature seen in clubbed fingers has been studied 
extensively using chord-arc measurements and unguisome- 
ters, it is not easily measured at the bedside. Moreover, nail 
curvature tends to become more pronounced with age and 
can occur in the absence of other signs of clubbing. 5,24 

Nailfold Angles 

Inspection of clubbed fingers reveals a number of abnormali¬ 
ties in the angles made by the nail as it exits from the terminal 
phalanx. Lovibond 23 popularized this as the profile sign in his 
1938 report on the diagnosis of clubbed fingers. He observed 
that in normal fingers, the nail projects from the nail bed at an 
angle of about 160 degrees, but this angle approached 180 
degrees in clubbed fingers (Figure 14- IB). Later, the hypo- 
nychial angle was proposed as a more reliable sign than the 
profile angle in the assessment of clubbing (Figure 14-IB). 11 

Phalangeal Depth Ratio 

Estimation of the phalangeal depth ratio (PDR) can be used 
to identify clubbing (Figure 14-1C). 14 In the normal finger, 






































CHAPTER 14 Clubbing 


Figure 14-1 Appearance on Inspection for 
Clubbing 

A, Normal finger viewed from above and in profile, 
and the changes occurring in established clubbing 
viewed from above and in profile. B, The finger on 
the left demonstrates normal profile (ABC) and nor¬ 
mal hyponychial (ABD) nailfold angles of 169 
degrees and 183 degrees, respectively. The clubbed 
finger on the right shows increased profile and 
hyponychial nailfold angles of 191 degrees and 203 
degrees, respectively. C, Distal phalangeal finger 
depth (DPD)/interphalangeal finger depth (IPD) repre¬ 
sents the phalangeal depth ratio. In normal fingers, 
the IPD is greater than the DPD. In clubbing, this 
relationship is reversed. D, Schamroth sign. In the 
absence of clubbing, opposition of the index fingers 
nail to nail creates a diamond-shaped window 
(arrowhead). In clubbed fingers, the loss of the pro¬ 
file angle because of the increase in tissue at the nail 
bed causes obliteration of this space (arrowhead). 
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the distal phalangeal depth is smaller than the interpha- 
langeal depth. As connective tissue deposition expands the 
pulp in the terminal phalanx, this ratio becomes reversed. 
The PDR appears to be independent of age, sex, and ethnicity 
in randomly selected populations. 14 - 25 A similar ratio using 
distal and interphalangeal width can be determined, but it 
has not been studied as extensively as the PDR. 

Although the PDR was originally described using plaster 
casts and shadowgrams, subsequent studies have reported 
the use of calipers on live fingers. To perform this measure¬ 
ment, the calipers should touch but not compress the tissue 
at the distal phalanx and the interphalangeal joint of the 
index finger during measurement. Baughman et al 26 esti¬ 
mated that this technique takes no longer than 1 minute to 
perform. Visual estimation for the reversal of the PDR has 
been suggested as a simple bedside technique for clubbing, 
but the precision of this method has not been tested. 


Schamroth Sign 

In 1976, Schamroth 27 reported a new clinical sign that incor¬ 
porated 2 of the clinical features of clubbing (Figure 14-ID). 
Normal fingers create a diamond-shaped window when the 
dorsal surfaces of terminal phalanges of similar fingers are 
opposed. In the clubbed finger, the diamond becomes obliter¬ 
ated because of the loss of the profile angle and the increase in 
the soft tissue at the cuticle. Since its original description, this 
technique has become popular with physicians as a quick test 
to establish the presence of clubbing. The precision and accu¬ 
racy of this sign, however, have not been formally tested. 28 

Palpation 

On palpation of the base of the nail bed, the examiner per¬ 
ceives that the nail is floating within the soft tissue and, in 
advanced cases, may even be able to feel the proximal edge of 
the nail. This sign is best elicited by gently rocking the nail. 
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The examiner grips the sides of the subject’s finger between 
the thumb and middle finger of each hand. Exerting down¬ 
ward pressure with his or her own index fingers, the exam¬ 
iner then rocks the distal and proximal ends of the subject’s 
nail, using the nail bed as a fulcrum. 

METHODS 

We used the MEDLINE database to search for English-lan¬ 
guage articles related to the clinical evaluation of clubbing 
that were published between January 1966 and April 1999. 
The Medical Subject Heading (MeSH) “hypertrophic osteo¬ 
arthropathy,” followed by the text word “clubbing,” was used 
in the following search strategy: “physical examination or 
physical exams,” “medical history taking,” “professional com¬ 
petence,” “sensitivity” and “specificity” or “sensitivity and 
specificity,” “reproducibility of result,” “observer variation,” 
“diagnostic tests,” “routine,” “decision support techniques,” 
and “Bayes theorem.” This strategy resulted in a limited 
number of articles. 

To expand the search, the titles and abstracts of all articles 
retrieved using the MeSH heading “hypertrophic osteoar¬ 
thropathy” or the text words “clubbing” and “Hippocratic fin¬ 
gers” were evaluated by each author independently. According 
to this review, relevant publications were retrieved and their 
bibliographies were evaluated for additional material. We also 
examined standard textbooks of physical diagnosis for infor¬ 
mation on the physical examination for clubbing. We at¬ 
tempted to contact the authors of articles in which more than 
1 observer made a determination of clubbing to obtain addi¬ 
tional data about precision of the examination for clubbing. 
Studies selected for data extraction were those in which quan¬ 
titative or qualitative assessment for clubbing was described in 
a series of patients. Although our expanded electronic search 
identified 567 articles related to clubbing, only 16 studies met 
the criteria for inclusion in our analysis. 

Study Characteristics 

Clubbing differs from other physical signs evaluated in The 
Rational Clinical Examination series in that the lack of an 
accepted objective diagnostic criterion standard precludes 
meaningful assessment of the accuracy of clinical examina¬ 
tion. However, our review of the literature on clubbing per¬ 
mitted us to evaluate quantitative indices used to distinguish 
clubbed from normal fingers, precision of physicians’ bed¬ 
side clinical examination for clubbing, and accuracy of club¬ 
bing as a marker of selected diseases. We chose to limit our 
review of the quantitative indices of clubbing to studies of 
nail-fold angles and the PDR because of their potential appli¬ 
cability at the bedside. 

Data Analysis 

Pooled weighted averages were calculated for quantitative 
measurements of nailfold angles and PDRs from data in 
studies of normal and diseased populations. Using data avail¬ 
able in 2 articles on the precision of clubbing, we calculated K 


statistics using the Stata Statistical Package (version 3.0; 
Computing Resource Center, Santa Monica, California). 
Sensitivities, specificities, and likelihood ratios (LRs) of club¬ 
bing as a marker of specific underlying disease were calcu¬ 
lated from original data when possible. 

RESULTS 

Quality of the Evidence 

By consensus and using criteria previously developed for this 
series, we appraised the quality of the evidence contained in 
the articles that we retrieved. 29 For reasons of selection bias, 
small sample size, and lack of an independent, blind compar¬ 
ison of the physical sign with a criterion standard, we classi¬ 
fied all of the included studies as level 4, leading to grade C 
recommendations. 29 

Quantitative Indices of Clubbing 
in Normal and Disease States 

Using plaster casts, shadowgraphs, and calipers, nailfold 
angles and the PDR have been measured in normal popula¬ 
tions and in subjects with diseases associated with clubbing. 
The precision of these quantitative techniques is high. Using 
the shadowgraph method, Kitis et al 30 examined the precision 
of measuring nailfold angles. Duplicate measurements of 51 
subjects showed a difference of 0.2 degrees in the mean of 
both the hyponychial and profile angles, with SDs of 4.6 
degrees and 4.3 degrees, respectively. Although Waring et al 15 
found that the measurement of the PDR with calipers on live 
fingers rather than plaster casts resulted in a loss of precision, 
Baughman et al 26 investigated intrarater reliability and found 
an SD of only 0.0008. In the same study, 2 observers indepen¬ 
dently measured the ratio in 20 subjects, and the maximal 
difference in PDR was 0.03. 

Published data pertaining to the measurement of nailfold 
angles and the PDR in disease-free individuals are summa¬ 
rized in Table 14-2. The pooled weighted mean values for the 
profile and hyponychial angle are 167 degrees and 179 
degrees, respectively. The pooled weighted mean PDR is 0.90. 
Do these measurements help distinguish those with from 
those without clubbing? The range was available for only 45 
of the 161 disease-free subjects in whom the profile angle was 
measured, and none exceeded 176 degrees. In studies of 
hyponychial angles, none of the 171 disease-free subjects had 
angles greater than 192 degrees. The PDR has been reported 
in 359 disease-free subjects, and in only 1 did it exceed unity. 

Table 14-3 shows the nailfold angles and PDR s in patients 
with diseases associated with clubbing. In such chronic dis¬ 
eases as cystic fibrosis and cyanotic congenital heart disease, 
the nailbed angles and the PDRs are significantly higher than 
those found in disease-free populations. In case series of 
asthma and COPD, PDRs are slightly higher than normal 
values. However, it is impossible to exclude the possibilities 
that these series may have included patients with other pul¬ 
monary disorders associated with clubbing or that some 
patients were selected because they had clubbing. 
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Table 14-2 Reported Values for Profile Angle, Hyponychial Angle, and Phalangeal Depth Ratio in Disease-Free Subjects 


Source, y 

Technique 

Population 

No. of Subjects 

Mean Profile Angle, Degrees (SD) 

Bentley et al, 13 1976 

Shadowgraph 

Healthy subjects from a surgical clinic (age not 
specified) 

25 


168 (3.7) 

Kitis et al, 30 1979 

Shadowgraph 

Healthy hospital employees 

116 


166(4.3) 

Sinniah and Omar, 31 1979 

Shadowgraph 

Healthy children (source population not specified) 

20 


171 (5.5) 

Pooled weighted mean 



161 


167(4.4) 

Hyponychial Angle, Degrees (SD) 

Regan et al, 12 1967 

Plaster casts, planimeter 

Healthy manual workers 

10 


186 (2.0) 

Bentley et al, 13 1976 

Shadowgraph 

Healthy manual workers 

25 


180(4.2) 

Kitis et al, 30 1979 

Shadowgraph 

Healthy manual workers 

116 


178(4.6) 

Sinniah and Omar, 31 1979 

Shadowgraph 

Healthy manual workers 

20 


181 (5.2) 

Pooled weighted mean 



171 


179(4.5) 

Phalangeal Depth Ratio, pm (SD) 

Waring et al, 15 1971 

Plaster casts, micrometer 

Children and adults (source population not specified) 

160 


0.90 (0.04) 

Sly et al, 25 1973 

Plaster casts, micrometer 

Adults (medical center personnel and relatives of 
patients attending pediatric allergy clinic) 

60 


0.90 (0.04) 

Paton et al, 32 1991 

Plaster casts, micrometer 

Children and adults (random sample from people 
playing in nearby park) 

85 


0.89 (0.04) 

Baughman et al, 26 1998 

Live fingers, calipers 

Adults (medical center personnel) 

54 


0.92 (0.05) 

Pooled weighted mean 



359 


0.90 (0.04) 

Abbreviation: SD, standard deviation. 


Table 14-3 Reported Values for Quantitative Measures of Clubbing in Disease States 


Source, y 

No. of Subjects 

Technique Quantitative Measure 

Mean (SD) 

Asthma 3 

Waring et al, 15 1971 

45 

Plaster casts DPD/IPD ratio 

39/45 <1,0 3 

Sly et al, 25 1973 

119 

Plaster casts DPD/IPD ratio 

0.91 (0.05) 

Bentley et al, 13 1976 

25 

Shadowgraph Profile angle; hyponychial angle 

171 degrees (4.1 degrees); 185 degrees (6.4 
degrees) 

Paton et al, 32 1991 

20 

Plaster casts DPD/IPD ratio 

0.91 (0.05) 

Chronic Obstructive Pulmonary Disease 11 

Baughman et al, 26 1998 

54 

Live fingers, calipers DPD/IPD ratio 

0.94 (0.06) 

Bronchogenic Carcinoma 11 

Baughman et al, 26 1998 

109 

Live fingers, calipers DPD/IPD ratio 

0.98(0.1) 

Cystic Fibrosis 

Waring et al, 15 1971 

45 

Plaster casts DPD/IPD ratio 

38/45 > 1.0 ab 

Bentley et al, 13 1976 

50 

Shadowgraph Profile angle; hyponychial angle 

179 degrees (6.2 degrees); 195 (8.3 degrees) 

Lemen et al, 33 1 978 

18 

Plaster casts DPD/IPD ratio 3 

1.010(0.016) 

Pitts-Tucker et al, 34 1986 

73 

Shadowgraph Hyponychial angle 

192 degrees 

Paton et al, 32 1991 

44 

Plaster casts DPD/IPD ratio 

1.0(0.08) 

Cyanotic Congenital Heart Disease 

Waring et al, 15 1971 

27 

Plaster casts DPD/IPD ratio 

18/27 >1.0 a ' b 

Bentley et al, 13 1976 

25 

Shadowgraph Profile angle; hyponychial angle 

180 degrees (4.8 degrees); 196 degrees (2.5 
degrees) 

Asbestos Exposure 

Regan et al, 12 1967 

50 

Plaster casts, planimeter Hyponychial angle 

195 degrees (9.6 degrees) 

Crohn Disease 

Kitis et al, 30 1979 

200 

Shadowgraph Hyponychial angle 

184 degrees (7.8 degrees) 


Abbreviations: DPD, distal phalangeal depth; IPD, interphalangeal depth. 

“Individual values not reported; proportion of patients with DPD/IPD ratio of greater than 1.0 reported. 
b Value reported in the table is for right index finger only. 

“Pooled weighted average for right index finger only. 
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In summary, in disease-free subjects, a PDR of more than 1 
is rare, the profile angle does not exceed 176 degrees, and the 
hyponychial angle does not exceed 192 degrees. To facilitate 
clinical use, we suggest accepting values of less than 180 
degrees for the profile angle (a straight line) and less than 190 
degrees for the hyponychial angle as describing normality. 

PRECISION AND ACCURACY 

Precision of the Clinical Examination for Clubbing 

Four studies 35 ' 38 have reported the precision of physicians’ 
bedside examination for clubbing (Table 14-4). Although 
several of the case series describing the prevalence of club¬ 
bing in various disease states used multiple examiners, none 
reported interrater reliability. We have excluded from this 
section reports of precision that used only casts or shadow¬ 
graphs for determination of precision because potentially 
important clinical information from inspection or palpation 
of the live finger was not available to the examiners. 

In an attempt to challenge the prevailing wisdom that 
clubbing was easily recognized, Pyke 35 studied the precision 
of physicians’ global assessment for the sign. He enlisted 12 
physicians and 4 medical students to examine 12 patients 
for the presence of clubbing. He purposefully chose patients 
who exhibited the full range of findings from normal to 
advanced clubbing. Overall agreement was fair (k = 0.39). 
From the reported data, it was impossible to determine the 
effect of training on the examiners’ precision, but it was 
clear that the examiners used different criteria to identify 
clubbing. After completion of their assessments, Pyke 35 
asked the examiners to define clubbing, and he received a 
wide variety of answers. 

Rice and Rowlands 36 used several quantitative indices, 
including PDRs, to assemble 11 patients who exhibited a range 
of findings from normal to advanced clubbing. Nineteen clini¬ 
cians, all internal medicine staff or resident physicians, exam¬ 
ined the patients for clubbing. Clubbing was judged to be 


Table 14-4 Interobserver Agreement of Clinical Examination for Clubbing 

Source, y 

No. of 
Observers 

Observers’ Level 
of Experience 

K 

Pyke, 35 1954 

16 

4 Medical students 

4 Medical registrars 

4 Surgical registrars 

4 Senior physicians 

0.39 

Rice and Rowlands, 36 1961 

19 

Residents 

Fellows 

Staff physicians 

0.36 

Smyllie et al, 38 1965 

9 

5 Medical registrars 

4 Consultant physicians 

0.90 

Spiteri et al, 37 1988 

24 

2 Senior house officers 

14 Medical registrars 

8 Consultant physicians 

0.45 


present in 103 of the 209 subject examinations. As with Pyke’s 35 
findings, observer agreement was only fair (k = 0.36). 

Precision of physical examination for a variety of signs of 
pulmonary disease, including clubbing, was evaluated in a 
study in which 24 experienced physicians examined 4 
patients each. 37 The precision of the examination for club¬ 
bing was moderate (k = 0.45). Although several signs showed 
marginally greater precision (eg, wheezes, K = 0.51), most 
signs had significantly lower precision (eg, displaced trachea, 
K = 0.01; whispering pectoriloquy, K = 0.11). 

A 1965 study 38 contrasted other reports of the precision of 
the physical examination for clubbing. Of 21 pulmonary 
signs, clubbing exhibited the highest rate of interobserver 
agreement among 9 experienced physicians examining 20 
patients (k = 0.90). 39 This high level of precision may reflect 
either the experience of the examiners or a selection bias 
because the degree of clubbing in affected patients was not 
described. The use of cases of more advanced clubbing may 
have led to an overestimation of precision. 

Accuracy of Clubbing as a Marker of Disease States 

Determination of the accuracy of clinical examination tech¬ 
niques to detect clubbing has been confounded by incorpo¬ 
ration bias that results when the clinical examination itself 
forms part or all of the diagnostic criterion standard. One 
example of such confounding is illustrated by the digital 
index of Vasquez et al. 40 This index, the sum of the ratios of 
the distal phalangeal finger depth and interphalangeal depth 
circumferences in all 10 fingers, has been reported to have a 
high sensitivity and specificity for clubbing. However, the 
index was evaluated in patients with cyanotic congenital 
heart disease, whose clubbing was so marked that it was 
“obvious by simple inspection.” 40 Only 1 study 36 measured 
the accuracy of clinicians’ bedside examination for clubbing 
against a priori diagnostic criteria derived from quantitative 
indices in disease-free populations and those with disease. 
Unfortunately, data were not given in sufficient detail to 
allow calculation of the sensitivity and specificity of the clini¬ 
cal examination. Hence, data on the accuracy of clinical 
examination compared with the quantitative indices to 
detect clubbing are limited. 

An alternative approach is to consider the accuracy of the 
presence of clubbing as a marker of underlying disease. 
Because many patients with clubbing have pulmonary disease, 
a relevant clinical question is whether clubbing separates those 
with COPD from those who have clubbing associated with 
pulmonary malignancy. In this way, 1 study 26 assessed the use¬ 
fulness of the PDR in distinguishing patients with docu¬ 
mented lung cancer from control subjects and those with 
COPD. Using calipers, Baughman et al 26 measured the PDR in 
both right and left index fingers of 109 patients with known 
lung cancer, 55 patients with COPD, and 54 control subjects. 
Of the 54 control subjects, none had a PDR in excess of 1. In 
those patients who had a PDR greater than 1,40 had lung can¬ 
cer and 5 had COPD alone (LR, 3.9; 95% confidence interval 
[Cl], 1.6-9.4). Seventy patients who had a PDR of 1 or less had 
lung cancer, and 49 with the same depth ratio had COPD 
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alone (LR, 0.7; 95% Cl, 0.6-0.8). We reclassified 1 subject in 
the COPD group who had a pulmonary nodule detected on 
chest radiography at study entry, which was subsequently 
diagnosed as adenocarcinoma of the lung. 

These data confirm, as expected, that although a normal 
PDR does not rule out lung cancer, an abnormal ratio 
implies an increased probability (LR, 3.9; 95% Cl, 1.6-9.4) of 
underlying lung cancer. Only 3 of the patients with COPD 
had a PDR greater than 1.05, and none had a ratio greater 
than 1.1. Among individuals with lung cancer, there was no 
significant difference in the prevalence of clubbing (as 
defined by distal phalangeal finger depth/interphalangeal fin¬ 
ger depth ratio > 1) among the different histologic subtypes 
of lung cancer. 

Kitis et al 30 investigated the association of clubbing with 
the activity of inflammatory bowel disease in 327 patients. 
Clubbing was defined as a shadowgraph-measured hypo- 
nychial angle greater than 186 degrees, which corresponded 
to 1.65 SDs above the mean value found in a group of 116 
healthy controls. Disease activity was determined using an 
index incorporating the results of various laboratory investi¬ 
gations. The LRs for clubbing as a marker of active Crohn 
disease were 2.8 (95% Cl, 1.8-4.1) and 3.7 (95% Cl, 1.4-9.4) 
for ulcerative colitis. The sensitivity and specificity values 
were 0.58 and 0.79 for Crohn disease vs 0.30 and 0.92 for 
ulcerative colitis, respectively. 


CLINICAL SCENARIOS—RESOLUTION 


In the first case, you find that the patient appears to have 
increased nail curvature. You use calipers to estimate a PDR 
of 0.90, and on inspection, you estimate a profile angle of 
about 160 degrees. According to your knowledge of these 
values in disease-free subjects, you inform the respiratory 
therapist that her mother does not have clubbing. On the 
other hand, you find that the second patient has a PDR of 
1.1 and a profile angle of 180 degrees, findings that are 
unusual for disease-free subjects or patients with COPD 
alone. You conclude that a search for bronchogenic carci¬ 
noma (or other causes of clubbing) should be undertaken. 


THE BOTTOM LINE 

For generations, medical students and residents have been 
quizzed at the bedside about the diagnostic features of club¬ 
bing. Confident though their inquisitors may be in their own 
ability to detect clubbing, the literature shows that interob¬ 
server agreement is only fair to moderate and that the accu¬ 
racy of techniques to detect clubbing has not been well 
established. Nevertheless, because nonhereditary clubbing is 
almost always a portent of serious disease, clinicians need to 
be as certain as possible about its presence. 

Recognizing the limitations of the studies we have 
appraised, we recommend the following: 

In cases of diagnostic uncertainty, the PDR may be helpful. 
This ratio can be measured with calipers at the bedside and 
in disease-free populations rarely exceeds 1.0. An increased 


ratio should prompt a search for underlying disease. 
Although patients with COPD have slightly higher ratios 
than do disease-free subjects, it is unusual for the ratio to 
exceed 1.05. A value in excess of this in a patient with COPD 
should prompt a search for bronchogenic carcinoma. 
Because most clinicians do not have calipers, visual estima¬ 
tion of reversal of the PDR should be assessed. 

Although the accuracy of clinicians’ bedside estimation of 
nailfold angles has not been studied, the normal values for 
these angles have been established. A profile angle that 
approaches a straight line (180 degrees) is rare in disease-free 
subjects and, in our opinion, is easily identifiable at the bed¬ 
side. Although the normal range of the hyponychial angle has 
also been defined, this angle is more difficult to estimate at 
the bedside. 

No published evidence exists as to the diagnostic yield or 
the optimal strategy for investigating a patient with clubbing. 
Therefore, after completion of a thorough medical history 
and physical examination, clinical judgment must guide the 
choice of investigations. 
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Reviewed by Marisa D'Silva, MD 


CLINICAL SCENARIO 


A 24-year-old male intravenous drug user presents to the 
emergency department with fatigue and weight loss. 
Examination shows cervical lymphadenopathy, a palpable 
liver, and signs of recent intravenous drug use. Should the 
clubbing raise concerns for other diagnoses that would 
explain his presentation? 

UPDATED SUMMARY ON CLUBBING 

Original Review 

Myers KA, Farquhar DRE. Does this patient have clubbing? 
JAMA. 2001;286(3):341-347. 

UPDATED LITERATURE SEARCH 

We searched the MEDLINE database from May 1999 to July 
2004, using the same search strategy used for the original review. 
This resulted in a limited number of articles, so all abstracts from 
a MEDLINE database search using the title word “clubbing” 
were reviewed. We also reviewed the Citation Index (ISI Web of 
Knowledge and Science Citation Index Expanded) and PubMed 
databases for relevant articles. This strategy resulted in 2 new 
articles related to the diagnosis of clubbing. 

NEW FINDINGS 

• Digital photography is an accurate, inexpensive, and easy 
method for calculation of the hyponychial angle. 

• The upper limit of the hyponychial angle for healthy indi¬ 
viduals is confirmed as approximately 192 degrees; the 
phalangeal depth ratio (PDR) is confirmed as less than 1 in 
healthy individuals. 

• The PDR correlates with hypoxemia and airways obstruc¬ 
tion in cystic fibrosis. 

Details of the Update 

Since the original review, 2 studies have been published that 
used quantitative methods to assess clubbing. Husarik et al, 1 
using a software angle measurement application, measured the 
hyponychial angle of the right index finger on digital photo¬ 


graphs. They determined that the hyponychial angle of healthy 
individuals does not exceed 192 degrees, confirming the results 
of the original review. Data are also reported for bronchogenic 
carcinoma (n = 17), human immunodeficiency virus (HIV) 
disease (n = 19), chronic hepatitis (n = 21), cirrhosis (n = 19), 
pneumonia (n = 47), heart failure (n = 95), ischemic heart dis¬ 
ease (n = 170), and other disorders. For a variety of illnesses, 
the quantitatively measured hyponychial angle is significantly 
greater than seen in patients without an abnormal hyponychial 
angle. However, fewer than 25% of patients with these illnesses 
have hyponychial angles that exceed the normal upper range of 
192 degrees, making the presence of clubbing an insensitive di¬ 
agnostic marker. Patients with emphysema (n = 9) and ac¬ 
quired valvular heart disease (n = 81) were not different from 
patients without the disease (P < .13). 

In a second study, Nakamura et al 2 examined the PDR of 100 
healthy subjects and 100 patients with cystic fibrosis using 
plaster finger casts. The mean PDR of healthy controls was 0.90 
(SD, 0.037), and no values exceeded unity, confirming the 
results of the original review. Among patients with cystic fibro¬ 
sis, the presence of clubbing predicts those who will be hypox¬ 
emic. Similarly, the absence of clubbing made hypoxemia 
much less likely. 

No additional studies have compared quantitative measures of 
clubbing to physicians’ bedside assessments, and no studies have 
evaluated the diagnostic yield or the optimal strategy for investi¬ 
gating a patient with clubbing. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

Additional studies allow us to reestimate the normal PDR and 
the hyponychial angle. The normal PDR is 0.90 (95% confidence 
interval [Cl], 0.89-0.90); the normal hyponychial angle is 181 
degrees (95% Cl, 178-183 degrees). Patients with a hyponychial 
angle greater than 192 degrees would be considered to have an 
abnormal hyponychial angle. 

CHANGES IN THE REFERENCE STANDARD 

During the past century, nail fold angle measurements using 
unguisometers, shadowgrams, or plaster casts of fingers have 
been proposed as the reference standard for clubbing. Because 
these methods are cumbersome and time consuming, the clini- 
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cal examination by experienced physicians has been accepted as 
establishing a diagnosis of clubbing. 

RESULTS OF LITERATURE REVIEW 

See Table 14-5. 


Table 14-5 Univariate Findings Concerning Clubbing 

Finding (n = 1) Disorder LR+ (95% Cl) LR- (95% Cl) 

Clubbing in Hypoxemia 3.2(1.9-6.4) 0.13(0.06-0.27) 

cystic fibrosis 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio, LR-, negative 
likelihood ratio. 


EVIDENCE FROM GUIDELINES 

No guidelines advocate the routine assessment of clubbing. 


CLINICAL SCENARIO—RESOLUTION 


Test results for HIV and hepatitis C virus were positive. Chest 
and abdominal imaging results were unremarkable. A thy¬ 
roid-stimulating hormone level was normal. Although club¬ 
bing has been associated with both HIV and hepatitis C viral 
infections, it has also been associated with endocarditis. You 
reexamine the patient and find no heart murmur, fever, or 
stigmata of endocarditis, but you obtain blood cultures and 
echocardiography to rule out endocarditis. 


CLUBBING— MAKE THE DIAGNOSIS 


PREVALENCE OF CLUBBING 

The probability of clubbing depends on the underlying ill¬ 
ness. The frequency of a quantitatively measured hyponych- 
ial angle greater than 192 degrees has been reported in the 
illnesses listed in Table 14-6. 1 

POPULATION FOR WHOM CLUBBING 
SHOULD BE CONSIDERED 

Clubbing can occur in a variety of illnesses. It should be 
considered among patients with cystic fibrosis or bron¬ 
chiectasis as a marker for chronic hypoxemia. In patients 
with clubbing that is not congenital, it would be reasonable 
to obtain a chest radiograph to look for pulmonary condi¬ 
tions associated with clubbing. 

REFERENCE STANDARD TESTS 

The pragmatic standard is examination by an experienced 
clinician, although laborious quantitative measures can be 
done as part of a research study. 


Table 14-6 Prevlance of Hyponychial Angle Exceeding Upper Range 
of Normal Among Various Conditions 

Condition 

Prevalence, % (95% Cl) 

Pneumonia 

23 (11 -36) 

HIV disease 

16(0-32) 

Cirrhosis 

16(0-32) 

Chronic obstructive pulmonary disease 

15(6-23) 

Chronic hepatitis 

14(0-29) 

Bronchogenic carcinoma 

12(0-27) 

Pulmonary hypertension 

10(0-20) 

Acquired valvular heart disease 

9(3-15) 

Congestive heart failure 

9(4-15) 

Ischemic heart disease 

9(5-14) 

Solid tumor malignancy 

6(1-11) 


Abbreviations: Cl, confidence interval; HIV, human immunodeficiency virus. 


REFERENCES FOR THE UPDATE 2. Nakamura CT, Ng GY, Paton JY, et al. Correlation between digital club¬ 

bing and pulmonary function in cystic fibrosis. Pediatr Pulmonol. 
1. Husarik D, Vavricka SR, Mark M, Schaffner A, Walter RB. Assessment of 2002;33(5):332-338. a 

digital clubbing in medical inpatients by digital photography and com- - 

puterised analysis. Swiss Med Wkly. 2002;132(ll-12):132-138. a 

a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 
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EVIDENCE TO SUPPORT THE UPDATE 


Clubbing 



TITLE Assessment of Digital Clubbing in Medical Inpa¬ 
tients by Digital Photography. 

AUTHORS Husarik D, Vavricka SR, Mark M, Schaffner 
A, Walter RB. 

CITATION Swiss MedWkly. 2002;132( 11-12): 132-138. 

QUESTION Can digital photography reliably assess the 
hyponychial angle of healthy controls and medical inpa¬ 
tients, and what is the range of angles associated with var¬ 
ious medical diseases? 

DESIGN The right index finger was digitally photo¬ 
graphed, and a software angle measurement application 
was used to calculate the hyponychial angle. Three investi¬ 
gators performed the measurements on each finger. The 
patients’ underlying medical diagnoses were obtained 
through chart review. 

SETTING Medical inpatient ward in Switzerland; 
healthy controls (population not specified). 

PATIENTS Five hundred fifteen patients admitted as 
general medical inpatients and 123 healthy controls. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Digital photography and a software angle measurement 
application were used to assess the hyponychial angle. Inter¬ 
rater and intrarater reliability was calculated. 

MAIN OUTCOME MEASURES 

Hyponychial angles of patients and controls; angles by dis¬ 
ease category as ascertained through chart review. 

MAIN RESULTS 

Measurement of the hyponychial angle with this technique 
demonstrated high intrarater and interrater reliability. Pro¬ 


file angles and phalangeal depth ratios were not measured. 
The investigators did not compare their proposed angle for 
diagnosis of clubbing (192 degrees) with physician assess¬ 
ment of clubbing. Each patient’s medical record was 
reviewed for diagnoses associated with clubbing. The inves¬ 
tigators found that mean hyponychial angles were signifi¬ 
cantly increased compared with that of healthy controls for 
most diagnoses, even those not traditionally associated with 
clubbing, such as ischemic heart disease. When compared 
with other patients in their study, only those with chronic 
obstructive pulmonary disease, cystic fibrosis, cirrhosis, 
chronic hepatitis, and human immunodeficiency virus 
infection had significantly increased hyponychial angles. 
Approximately 15% of the patients with chronic obstructive 
pulmonary disease, cirrhosis, chronic hepatitis, and human 
immunodeficiency virus infection had hyponychial angles 
greater than 192 degrees. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTH Objective, reliable method for determine 
hyponychial angle. 

LIMITATIONS Although the angles may have been deter¬ 
mined in a blinded fashion from underlying disease status, it 
is not explicitly stated that the chart review was blinded to the 
presence/absence of clubbing. 

Previous studies of clubbing have used cumbersome and 
impractical techniques, such as plaster finger casts and shad¬ 
owgrams, to measure nailfold angles. This study reports a 
new, reliable technique for the quantitative assessment of 
clubbing. The data in healthy controls confirm that the 
hyponychial angle does not exceed 192 degrees. Although the 
mean hyponychial angle of patients with a wide variety of 
medical diagnoses exceeded that of controls, the prior proba¬ 
bility of clubbing as defined by an angle greater than 192 
degrees was less than 20% for most conditions. Physicians’ 
bedside assessment of clubbing was not evaluated. 

Reviewed by Kathryn A. Myers, MD 
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TITLE Correlation Between Digital Clubbing and Pul¬ 
monary Function in Cystic Fibrosis. 

AUTHORS Nakamura CT, Ng GY, Paton JY, Keens TG, 
Witmer JC, Bautista-Bolduc D, Woo MS. 

CITATION Pediatr Pulmonol. 2002;33(5):332-338. 

QUESTION Does digital clubbing in patients with cystic 
fibrosis predict hypoxemia and airflow limitation? 

DESIGN Plaster casts of 100 patients and 100 healthy con¬ 
trols were created to allow measurement of the phalangeal 
depth ratio (PDR) of the right index finger. The PDR was 
compared to oxygen levels and pulmonary function tests. 

SETTING Los Angeles Childrens’ Hospital. 

PATIENTS Patients with cystic fibrosis and without 
rheumatologic or cyanotic congenital heart disease were 
included. Controls were recruited from unrelated visitors 
of hospital patients and employees and their families. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The PDR of each subject was measured independently by 2 
investigators using a micrometer on the right index finger 
cast. An average of the 2 measurements constituted the club¬ 
bing index. A clubbing index greater than 1.00 was defined as 
clubbing. 

MAIN OUTCOME MEASURE 

Correlation of clubbing with hypoxemia and pulmonary 
function ( able 14-7). 


Table 14-7 Clubbing in Cystic Fibrosis Predicts Hypoxemia 
Finding LR+ (95% Cl) LR- (95% Cl) 

Clubbing in cystic fibrosis 3.2 (1.9-6.4) 0.13 (0.06-0.27) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

MAIN RESULTS 

The PDR of healthy controls was 0.90 (SD, 0.04; range, 0.81- 
0.97), and none exceeded 1.0. Seventy-five of the patients 
with cystic fibrosis were defined as having clubbing (PDR > 
1.0). Of the 25 patients without clubbing, the forced expira¬ 
tory volume in 1 second (FEVj) was 69% predicted, whereas 
those with clubbing had an FEVj of 45% predicted. The PDR 
was inversely correlated with hypoxemia (r = -0.56; P < .001) 
in patients with cystic fibrosis. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Reliable, quantitative measure to assess 
clubbing. 

WEAKNESS It is not explicitly stated that the clinicians were 
blinded to the hypoxemia status. 

The presence of clubbing in patients with cystic fibrosis is 
associated with hypoxemia, and its absence made hypoxemia 
much less likely. The ability of physicians to detect clubbing 
at the bedside using the diagnostic standard of PDR greater 
than 1 was not evaluated by the investigators. 

Reviewed by Kathryn A. Myers, MD 
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CHAPTER 


CLINICAL SCENARIOS 


Is This Patient 

Taking the Treatment 
as Prescribed? 

Barbara J. Stephenson, RN 
Brian H. Rowe, MD, MSc 
R. Brian Haynes, MD, PhD 
William M. Macharia, MD, MSc 
Gladys Leon, MD, MSc 


CASE 1 A 28-year-old woman presents to the emergency 
department in acute distress, with a 3-day history of wors¬ 
ening asthma. Her prescribed medications include an 
inhaled P 2 agonist and an inhaled steroid. When ques¬ 
tioned, she breathlessly admits to “occasionally” missing 
her medications “maybe only once or twice.” 

CASE 2 A 55-year-old man with posttraumatic seizure 
disorder has been taking phenytoin since his injury. His 
seizures were initially adequately controlled but he 
recently has been having weekly seizures. In an office visit 
he resentfully denies missing any of his medication. 


THE IMPORTANCE OF CLINICAL EXAMINATION 


Physicians should measure compliance for patients pre¬ 
scribed a self-administered treatment because noncompli¬ 
ance is common and physicians can help patients improve 
their compliance 1,2 and increase the benefit they derive from 
therapy. Compliance with long-term self-administered 
medication therapy is approximately 50% for those who 
remain in care. 3 There is a wide range of compliance among 
patients, from 0% to 100%. This average compliance rate of 
50% provides only the most limited picture of compliance; 
in general, there is substantial undercompliance. Further¬ 
more, compliance by individuals can vary considerably over 
time. Compliance rates for short-term self-administered 
therapies average about 75% initially but decrease to less 
than 25% for the completion of antibiotic therapy for acute 
infections. Aside from its potential for undermining the 
effectiveness of any treatment, noncompliance is associated 
with poorer prognosis. 4 

Table 15- depicts combinations of treatment outcome 
and compliance that present in clinical practice and need 
to be distinguished from one another to initiate the 
appropriate intervention. The bottom right cell, D, repre¬ 
sents the most desirable state: high compliance with 
achievement of the treatment goal. Patients who fall into 
cell A (low compliance and suboptimal achievement of the 
treatment goal) are in need of efforts to promote compli¬ 
ance. Patients in cell B (high compliance without achiev¬ 
ing the treatment goal) require more or better treatment, 
whereas those in cell C (achievement of the treatment goal 
despite low compliance) need less treatment prescribed or 
may actually have been misdiagnosed or mistreated and 
do not merit intervention to increase compliance, at least 
until the need for treatment is reassessed. The aim of com¬ 
pliance assessment, along with other diagnostic tests, is to 
categorize patients into the appropriate cells. When it has 
been determined that patients occupy cell A, B, or C, phy- 
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sicians may then alter treatment to attempt to move 
patients into cell D. 

The 2 cases illustrate the importance of distinguishing 
between noncompliance and lack of therapeutic efficacy. On 
closer questioning, the first patient revealed that she had been 
using her inhaled steroid sporadically and gradually lost con¬ 
trol of her asthma without change in extrinsic allergic stimuli. 
She initially provided 2 useful clinical clues to important devi¬ 
ation from her prescribed regimen: worsening symptoms 
while prescribed usually adequate therapy and admission of 
“occasional” noncompliance. Reinstituting her usual regimen 
and reinforcing the need for compliance, particularly with the 
inhaled steroid, improved her treatment results. 

Case 2 required a different solution: the phenytoin levels 
eventually proved to be in the therapeutic range, confirm¬ 
ing the patient’s compliance, and a second medication was 
required to improve long-term control. Accurate assess¬ 
ment of compliance by questioning could prevent overdos¬ 
ing the patient with additional therapy on the assumption 
of noncompliance and permit timely addition of the second 
therapy. 

Although maintaining an adequate level of compliance 
is central to deriving benefit from any efficacious therapy, 
the degree of compliance necessary to achieve a measur¬ 
able benefit from specific medications is variable. Haynes 
et al 5 found that 80% compliance was necessary to achieve 
a reduction in blood pressure from antihypertensive ther¬ 
apy with the types and doses of medication that were pre¬ 
scribed by primary care physicians, whereas Markowitz 6 


Table 15-1 Complications and Treatment Response 3 

Compliance Rate 

Achievement of the - 

Treatment Goal Low High 

No A: The target group B: Inadequate therapy? 

Yes C: Unnecessary therapy? D: Ideal 

“Based on Sackett et al . 1 


Table 15-2 Methods to Measure Compliance 

Direct 

Drug or metabolite levels 
Blood 
Urine 
Saliva 

Tracer compounds 

Indirect 

Appointment keeping 
Therapeutic response 
Self-report 
Pill count 
Pharmacy records 
Medication event monitors 


reported that children receiving only a third of their pre¬ 
scribed penicillin had substantial protection from recur¬ 
rences of rheumatic fever. The thresholds of compliance 
for acceptable therapeutic effects are not known for most 
regimens. 

THE NATURE OF NONCOMPLIANCE 

On a practical level, it is not difficult to imagine why noncom- 
pliant behavior occurs. Patients often find medical regimens 
complicated, inconvenient, embarrassing, or expensive. Partic¬ 
ularly for chronic disorders, the short-term disadvantages fre¬ 
quently outweigh the long-term advantages. 

At a theoretic level, the nature and determinants of non- 
compliant behavior are more complex and not well under¬ 
stood, although there are interesting models. 7 Numerous 
studies of the “determinants” of compliance have led to the 
following generalizations. 8 Sociodemographic factors such as 
age, sex, race, intelligence, and education have little to do 
with compliance. Low compliance is a problem with self- 
administered treatments for all disorders, but patients with 
psychiatric problems are less likely to comply, and those with 
(other) disabilities caused by disease are more likely to com¬ 
ply. Long waiting times at clinics and long gaps between 
appointments lead to patients’ missing appointments and 
dropping out of care. The more complex or costly the regi¬ 
men and the longer its duration, the less the compliance. 


MEASURING NONCOMPLIANCE 

Most studies determining the limitations and strengths of 
clinical information about compliance include pill counts 
and measurement of serum levels of drugs or tracers. Special 
medication monitors can also reveal patterns of medication 
consumption that cannot be obtained by other means. None 
of these more accurate methods is likely to be handy to prac¬ 
titioners for most therapeutic regimens. 

No clinical measurement of compliance approaches per¬ 
fection, but clinical information can be used to narrow down 
the situations in which compliance measurement is most 
likely to be important for the care of the patient. A 3-step 
sequence will identify most noncompliers. 

1. Nonattendance at appointments. Dropout rates are high 
with many treatments, and nonattendance at a scheduled 
appointment is the first step astray. 

2. Lack (or loss) of responsiveness to a usually (or previ¬ 
ously) adequate dose of treatment. These patients are 
most in need of further assessment of their compliance to 
separate problems of therapy from those of compliance. 
Cases 1 and 2 would qualify for this route. 

3. For patients whose compliance is in doubt, particularly 
those who come to attention through steps 1 and 2, use 
the most appropriate method(s) from Table 15-2. Direct 
measures of medication consumption are most accu¬ 
rate, but they are available for only a small number of 
medications, can indicate spuriously high compliance 
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if the patient takes the prescribed dose only during the 
time leading up to assessments of drug levels, and do 
not apply to most nonmedication regimens, such as 
weight-loss diets. Even when available, they take time 
and money to obtain and are unlikely to help in such 
situations as the acute care of a patient in the midst of a 
crisis. 

Questioning of patients is the most widely applicable 
method of measuring compliance. Careful questioning will 
identify more than half of those who are noncompliant with¬ 
out falsely labeling many of the compliers. Patients should be 
asked to indicate, without prompting, exactly what medica¬ 
tions they are taking and when they are taking them. This may 
reveal a different understanding of and adherence to the regi¬ 
men than was prescribed. For patients who report a generally 
correct understanding of their prescription, the details of any 
noncompliance should be sought. The method of asking likely 
affects the accuracy of the response. Studies assessing the value 
of patient self-report have used a nonjudgmental, nonthreat¬ 
ening approach, prefacing the question with a remark such as 
the following 9 : “People often have difficulty taking their pills 
for one reason or another.” The question also must be asked in 
a particular way: “Have you ever missed any of your pills?” If 
the answer is affirmative, then ask the patient to estimate how 
many pills he or she has missed during the previous day and 
week. The interview can also provide insight into the possible 
reasons for noncompliance. This valuable clinical information 
can allow prompt reevaluation of the current regimen if the 
information is interpreted appropriately. It is essential to take 
into account that even if the patient admits to missing any 
medication during the previous day or week, he or she will still 
tend to overestimate the actual rate of compliance (by an aver¬ 
age of 17% in 1 study 9 ). 

Clinical measures of compliance sometimes can be supple¬ 
mented or replaced by other methods. Some regimens pro¬ 
duce telltale adverse effects, the absence of which suggests 
low compliance; for example, increased urinary frequency 
with the initiation of diuretics, dry mouth with anticholin¬ 
ergics, slow heart rate with (3-blockers, dark stool with oral 
iron, and suppression of thyrotropin (thyroid-stimulating 
hormone) with thyroid hormone replacement. 10 Blood level 
measurements are routinely available for some medications, 
and these can be used for monitoring compliance, particu¬ 
larly when the serum half-life is relatively prolonged. 11 When 
patients receive all their medications through a single phar¬ 
macy, pharmacy records can provide an indirect measure of 
compliance. 12 Medication event monitors, although provid¬ 
ing unique information about the pattern of medication tak¬ 
ing, are expensive and remain a research tool. 13,14 Tracers, 
either harmless substances such as riboflavin 15 or minute 
amounts of medications such as phenobarbital that can be 
easily measured, 16 are also research tools. 

For pill counts, drug and tracer levels, and surreptitious 
pill monitors, there are ethical issues to be addressed. 
Because they invade the patient’s privacy and can be used to 
usurp autonomy, when possible you should inform the 
patient of their intended use and ask for consent before using 


them. 17 Patients usually agree to monitoring if it is explained 
that the purpose of the assessments is to help better under¬ 
stand how they are taking their medicine. 


ACCURACY OF CLINICAL MEASURES OF COMPLIANCE 

Clinical judgment of compliance has been found wanting in 
almost every study in which it has been tested. Clinicians 
who believe that they are exceptions to this Ending because 
they know their patients well should take heed of a study by 
Gilbert et al. ls Primary care physicians were asked to give 
compliance estimates only for patients they thought they 
knew well. The sensitivity of clinical judgment for detecting 
noncompliance was an embarrassing 10%, and overall per¬ 
formance by clinicians was not better than if they had flipped 
coins instead of applying their “clinical judgment.” Physi¬ 
cians should not trust their unaided judgment regarding the 
compliance by individual patients. 

Studies have shown only a low-order correlation between 
nonattendance and noncompliance with self-administered 
treatments, but this is at least partly an artifact of nonattend- 
ers’ being frequently unavailable for compliance studies. For 
example, in one study, patients keeping all appointments 
appeared to be less compliant with antacid and anticholiner¬ 
gic medications for peptic ulcer therapy than patients who 
missed some appointments, but only 96 (60%) of the 160 
patients had complete follow-up assessments. 19 Richardson et 
al 20 confirmed both that attendance does not ensure compli¬ 
ance with medications and that compliance is even worse 
among nonattenders: of patients keeping more than 60% of 
their scheduled clinic appointments, 40% were found to be 
noncompliant with medication by urine metabolite mea¬ 
surement, whereas 95% of patients with lower appointment 
compliance demonstrated low compliance. 

The patient’s response to therapy is also only weakly 
related to compliance for many treatments, 9 but it can be 
useful when combined with other methods. For example, 
when Inui et al 21 treated patients as noncompliant if they 
either admitted noncompliance or had uncontrolled pres¬ 
sures, this combined compliance test had a sensitivity of 83% 
and a specificity of 66%. 

Questioning patients about their compliance is the most 
readily available, valid method of measuring compliance in 
clinical practice. To review the literature on self-report, we 
used previously published guidelines for collecting studies and 
preparing meta-analyses. 22 We identified studies comparing 
self-report with other measures of compliance and uncovered 
many studies comparing self-report with pill counts. The 4 
studies with the strongest research methods 9,13,18,23 are summa¬ 
rized in Table 15-3. The results of compliance tests were con¬ 
sidered positive if they uncovered noncompliance and negative 
if they verified compliance. In these studies, self-report yielded 
a sensitivity of 55%, a specificity averaging 87%, and a likeli¬ 
hood ratio for a positive test result of 4.4 on average. Patients’ 
reports of compliance with medication were less useful 
because the patients still may have been noncompliant 
(likelihood ratio for a negative test result, 0.5). In one study, 9 
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Table 15-3 Pooled Data From Methodologically Strong Studies 
Comparing Pill Count With Self-report 9 ' 1318 ' 23 


Pill Count 

Self-report 

Noncompliant 3 

Compliant 

Missed > 1 

152 

34 

None missed 

122 

232 

“Noncompliant was defined as taking less than 80% of pills, 9 - 18 
pills, 13 and less than 75% of pills. 23 

less than 100% of 


self-report outperformed several direct and indirect mea¬ 
sures of compliance. Similarly, when Fletcher et al 24 com¬ 
pared the usefulness of interview, pill count, and measurement 
of serum drug levels of digoxin, they found that interviewing 
was the most useful method. Unfortunately, to our knowl¬ 
edge there are no studies to date assessing the agreement 
among clinicians on eliciting compliance information from 
patients in usual settings or of the effect on self-reports of 
repeatedly questioning patients about their compliance. 

Counting the patient’s pills is valid for single assessments 
at the patient’s home if the purpose of the visit is not 
revealed in advance and if care is taken to determine the 
amount of medication that has been dispensed, the date the 
most recent prescription refill was begun, how much was 
left over from the previous prescription when the current 
prescription was begun, whether there has been any change 
in the prescription not noted on the pill container, and 
whether the patient has caches of pills in other locations or 
has shared them with relatives or friends. 9 When all factors 
are taken into account, the pill count compares favorably 
with serum drug levels. 18 However, pill counts of this rigor 
are impractical in most clinical settings, and pill counts 
performed on medications patients bring with them to 
clinic visits overrepresent compliance when compared with 
more tamper-proof methods, such as special pill containers 
that electronically monitor each dose as it is removed. 13 - 14 
The latter devices also show patterns of compliance that 
cannot be detected by simple pills counts, including 
increased compliance just before and after appointments 
and decreasing compliance between appointments. 

Although the absence of common adverse effects may be 
an indication of noncompliance, the link between compli¬ 
ance and adverse effects is either unknown or relatively tenu¬ 
ous. For example, for patients prescribed diuretics, the 
sensitivity for noncompliance of reductions in serum potas¬ 
sium level was 82% but the specificity was only 48%. 9 

THE BOTTOM LINE 

You can detect most noncompliant patients by watching for 
nonattenders, watching for nonresponders, and asking non¬ 
responders about their compliance. In addition to clarifying 
problems of undertreatment and overtreatment, information 
about patient compliance permits the efficient application of 
effective methods of increasing compliance. 1,2 
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UPDATE: Compliance and Medication Adherence 



Prepared by Hayden B. Bosworth, PhD 
Reviewed by Amy Rosenthal, MD 


CLINICAL SCENARIO 


A 65-year-old woman prescribed a diuretic and an angio¬ 
tensin-converting enzyme inhibitor continues to have inad¬ 
equate blood pressure control. She volunteers that she takes 
her medicines exactly as instructed on her bottles but did 
not bring the bottles with her. You are considering adding 
another antihypertensive medication, but she is already tak¬ 
ing 2 other medications for diabetes. Can you be confident 
that she is taking her medications as prescribed? Is there a 
way to get a better history of her medication adherence? 

UPDATED SUMMARY ON COMPLIANCE AND 
MEDICATION ADHERENCE 

Original Review 

Stephenson BJ, Rowe BH, Haynes RB, Macharia WM, Leon 
G. The rational clinical examination: is this patient taking the 
treatment as prescribed? JAMA. 1993;269(21):2779-2781. 

UPDATED LITERATURE SEARCH 

We searched MEDLINE, Current Index to Nursing and Allied 
Health Literature, and Psychinfo using the following key terms: 
“compliance.mp.” [mp = ti, ot, ab, rw, sh, it, tc, id], “adher- 
ence.mp.” [mp = ti, ot, ab, rw, sh, it, tc, id], and (sensitivity and 
specificity).tw. We limited our search to English-language publi¬ 
cations and obtained 30 publications in the period between 1993 
and February, 2005. In addition, we reviewed the references of the 
selected publications and found 1 additional publication. We lim¬ 
ited our review to only those that compared measures of medica¬ 
tion adherence to Medication Event Monitoring System (MEMS) 
caps, a validated electronic monitor of pill adherence (see below), 
which resulted in 6 articles. However, 2 of these studies had fewer 
than 10 nonadherent patients and the quality levels for the stud¬ 
ies were inadequate (quality score < 3) for inclusion. 1,2 

NEW FINDINGS 

• Complex questionnaires for assessing adherence may be no 
more efficient than the simple question, Have you missed 
any pills in the past week? 


Details of the Update 

Medications are the most common medical intervention. 
Adherence to medications is often suboptimal and nonadher¬ 
ence is associated with adverse health outcomes, as well as 
medical, social, and economic consequences. 3-5 Nonadherence 
with therapeutic medication recommendations is prevalent. 
Across different definitions of nonadherence, approximately 
50% of patients do not take their prescribed medications as 
recommended. 6-9 The true rate of nonadherence may be higher 
because patients with a history of nonadherence are likely 
underrepresented in outcomes research. 

Clinicians must frequently rely on their own judgment but 
unfortunately demonstrate no better than chance accuracy in 
predicting the medication adherence of their patients. 10 Clini¬ 
cians may conduct pill counts or review pharmacy records if 
available. The former method of assessing patient medication 
adherence is potentially problematic because, apart from being 
intrusive, it does not give any indication of when the medica¬ 
tion was taken or whether it was thrown away and thus may 
result in overestimation of adherence. Pharmacy refill records 
provide a reliable and nonintrusive longitudinal measure of 
medication adherence when the patient receives all their medi¬ 
cation from a centralized pharmacy such as that of the Depart¬ 
ment of Veterans Affairs or private sector health maintenance 
organizations. In addition, this method of assessing medica¬ 
tion adherence requires extensive data tracking programs. 

In general, patients tend to overestimate their medication 
adherence 11 and, unless a patient is not responding to ther¬ 
apy, it may be difficult to identify poor medication adher¬ 
ence. Asking patients about their medication use is often the 
most practical means of ascertainment, but it is prone to 
inaccuracy. A key validated question is, Have you missed any 
pills in the past week? and any indication of having missed 1 
or more pills signals a problem with low adherence. 12 Com¬ 
pared to pill counts as the reference standard, asking nonre¬ 
sponders about their medication adherence by using this 
single question will detect 55% of those with less than com¬ 
plete adherence, with a specificity of 87% (positive likelihood 
ratio [LR], 4.3; 95% confidence interval [Cl], 3.1-6.1; nega¬ 
tive LR, 0.51; 95% Cl, 0.44-0.58). 10 Other practical measures 
to assess adherence include watching for those who do not 
respond to increments in treatment intensity and patients 
who fail to attend appointments. Additional practical meth- 
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ods include review of pill bottles and, when available, check¬ 
ing on fill dates and pill counts. Finally, simply asking the 
patients to describe their medication regimen such as when 
they take their medication and what it is for can often be 
informative. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The literature search was conducted without restriction to 
year, but we focused on studies that compared self-reported 
measures of adherence to electronic monitors of pill adher¬ 
ence. The update provides LRs for questionnaires designed to 
detect nonadherence to medication, using an alternative ref¬ 
erence standard. 

CHANGES IN THE REFERENCE STANDARD 

The absence of a singular conceptual basis of medication adher¬ 
ence is problematic. Strategies to improve adherence can be 
evaluated only within the context of a given definition. Further¬ 
more, comparative assessment of the adherence literature is dif¬ 
ficult across studies using different definitions and methods of 
operationalizing adherence. A commonly used but arbitrary 
measure of optimal adherence has been identifying patients who 
take at least 80% of prescribed doses correctly. 13,14 In other 
words, patients who take at least 80% of doses correctly are con¬ 
sidered adherent. This level has not been validated in all circum¬ 
stances and may vary, depending on several factors, including, 
for example, the half-life of the prescribed compound. 15 Adher¬ 
ence to medication is not a dichotomy, and patients can demon¬ 
strate a wide variety of patterns of medication use. 

The assessment of adherence is a complex task, and there is no 
gold standard, with the exception of actually observing an indi¬ 
vidual taking the prescribed medication. Researchers interested 
in measuring medication adherence often rely on one of 6 mea¬ 
sures of adherence: pharmacy refills, pill counts, electronic mea¬ 
sures (eg, MEMS caps), biologic indices, self-report, and 
physician judgments. Because of the disparate metrics used by 
investigators, comparison between methods (eg, self-report vs 
pharmacy records) or even across studies that use the same 
methods is difficult. Although there may not be a “best” mea¬ 
surement strategy to obtain an approximation of adherence 
behavior, strategies used must meet basic psychometric stan¬ 
dards or acceptable reliability and validity properties. 

Direct methods for assessing medication adherence include 
those that are more objective and require limited interpreta¬ 
tion. Electronic measurement devices are considered the clos¬ 
est to a reference standard, and reviewed studies were limited 
to those that used electronic monitors. Electronic monitors, 
including the MEMS (AARDEX [APREX] Ltd, Union City, 
California) consist of a microprocessor placed in a medica¬ 
tion container with a switch that is activated by the inter¬ 
ruption of an electric current. When activated, the 
microprocessor records the date and time the bottle was 
opened. Several months of data can be stored on these units 
before they must be downloaded onto a computer. These 


medication monitors can provide information on the pattern 
of drug intake, including the frequency and timing of medi¬ 
cation dosing during a fairly extended period. Electronic 
monitors are not widely available and are expensive. They 
preclude the use of a pillbox to organize the medication being 
monitored by the electronic cap. In addition, some patients 
remove more than 1 dose per bottle opening to avoid carry¬ 
ing medication bottles when leaving home. These limitations 
may result in electronic monitoring underestimating a 
patient’s actual adherence. Electronic monitored adherence 
rates consistently range between 10% and 20% lower than 
rates assessed by other methods, including self-reports 16 and 
pill counts. 17 

Indirect methods for measuring medication adherence 
involve interpretations that are more subjective and often based 
on an individual’s perception of adherence. Because indirect 
measurements of adherence, specifically self-report measures, 
continue to be the most commonly used measure because they 
are simple, inexpensive, and convenient to use, 18 the current 
review will focus on the diagnostic properties of these measures. 

There are 3 basic types of patient self-report: questionnaires, 
interviews (in person or by telephone), and self-monitoring 
logs (eg, diaries). Questionnaire-based measures include 
multi-item scales (summarized below), visual analog scales, or 
reports of missed doses. Maintaining confidentiality of the 
data and promoting a cooperative relationship between pa¬ 
tients and the study team that collects the data can maximize 
the accuracy of patients’ self-reported adherence. These proce¬ 
dures make it less likely that patients will be defensive and de¬ 
liberately distort their responses or that communication 
problems would otherwise render assessments inaccurate, as is 
particularly a concern when patient adherence reports are col¬ 
lected by health professionals themselves. 19 


RESULTS OF LITERATURE REVIEW 

The results of the literature review are summarized in 
5-4. The first study 20 examined a specific self-reported sur¬ 
vey in comparison to MEMS caps among patients with 
human immunodeficiency virus infections. The self-report 
questionnaire was the Medication Adherence Self-Report 
Inventory, which consists of 12 items with 2 broad themes. 
The first section of this measure assessed the amount of med¬ 
ication actually taken, and the second part addressed the 
time of doses. The investigators selected the antiretroviral 
drug from the patient’s regimen that presented the greatest 
barrier to adherence (eg, higher pill burdens, dietary require¬ 
ments, more frequent dose intervals). A second study 21 exam¬ 
ined the 19-item Compliance Questionnaire Rheumatology 
against MEMS caps among 81 patients with rheumatoid 
arthritis who were taking nonsteroidal anti-inflammatory 
drugs. The third study 22 examined the relationship between 
the 6-item Simplified Medication Adherence Questionnaire 
and medication event monitoring among 40 patients using 
nelfinavir. The fourth study 23 examined the relationship 
between the 4-item Morisky measure and MEMS caps 
among 83 patients commencing tricyclic antidepressants. 
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Table 15-4 Likelihood Ratios of Self-reported Adherence Measures Compared to the Medication Event Monitoring System 


Measure 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl)“ 

LR- (95% Cl)“ 

Medication Adherence Self-Report Inventory 20 

66 

100 

33(4-317) 

0.34 (0.23-0.47) 

Compliance Questionnaire Rheumatology 21 

62 

95 

17(4.9-63) 

0.39 (0.23-0.58) 

Simplified Medication Adherence Questionnaire 22 

72 

95 

7.9 (2.4-29) 

0.31 (0.14-0.58) 

Morisky measure 23 

72 

74 

2.7(1.6-4.4) 

0.36 (0.18-0.64) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“The LR+ is the likelihood ratio (LR) for medication nonadherence. For example, a patient with at least 1 positive answer on the Morisky measure would have a positive result, sug¬ 
gesting that the likelihood of an adherence problem increases by 2.7. LR- is the LR for medication adherence. For example, a patient with all negative answers would have an LR 
of 0.36, suggesting that the patient is less likely to have a problem with mediation adherence. 


There are inherent self-reported biases that are likely to 
exist, 24 such as halo effects (eg, overreporting adherence) or 
recall bias. Self-reported adherence represents “an upper 
limit” of the estimate of actual adherence because of social 
desirability. Despite the biases in using self-report measures 
of medication adherence, studies tend to show that patients 
are accurate when they say that they have not taken their 
medication. 25 Simply put, when patients state they are hav¬ 
ing problems taking their medication as prescribed, they 
are telling the truth. Patients’ claims of medication adher¬ 
ence tend to underestimate their true rate by approximately 
20%. 24 Reasons for overreporting adherence may include 
the following: individuals might wish to give a socially 
desirable answer even though it deceives their physician, 
they might not understand their regimen and therefore not 
realize that they are not adhering, or they might forget 
instances of nonadherence. 

Clinicians who rely on self-reports of adherence need to take 
steps to improve the accuracy of their assessment. Suggested 
steps include giving clear directions on how to take medications, 
providing education and encouragement regarding the need for 
both adherence and accurate reporting of adherence so that 
patients will not give socially desirable answers, asking nonjudg- 
mental and nonthreatening questions about current medication 
use, and probing barriers to accurate reporting. 26 These steps 
should be taken routinely with all patients because there are few 
factors to help identify the patients at greatest risk of inaccurate 
reporting. 27 

Because appointment nonadherence can be easily checked, it 
should serve as a warning to screen for medication nonadher¬ 
ence. It is useful to ask patients what they already know and 
believe about their medications, including how many pills they 
take, as well as the names and purpose of taking them. Inquiring 
about the most common adverse events, as well as when they are 
likely to occur, may prompt the patient to have a more open dis¬ 
cussion about medication (and appointment) adherence. It is 
useful to ask patients what they already know and believe about 
the medications before and after explaining these points. 

EVIDENCE FROM GUIDELINES 

No guidelines give a standard approach to assessing or mea¬ 
suring adherence. Many guidelines for individual disorders 
address the need for assessing adherence. 


CLINICAL SCENARIO—RESOLUTION 


Despite her assertion that she is taking her medications 
just as instructed, you ask her whether she is having any 
problems taking her medications. You find that she is con¬ 
fused about when she should be taking her medications. 
After answering her question, you ask her to repeat the 
information. In addition, you ask whether you may 
explain the regimen to her husband. Finally, you provide a 
written reminder describing when each medication 
should be taken. 
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ASSESSING MEDICATION ADHERENCE— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Approximately 50% of patients do not take their medica¬ 
tions as prescribed. 

POPULATION FOR WHOM MEDICATION 
NONADHERENCE SHOULD BE CONSIDERED 

• All patients should be assessed 

• Patients not responding as expected to medication 

• Patients receiving multiple or complicated regimens 

• Patients who miss appointments 

• Older patients 

• Adolescents 

• Patients with cognitive disorders 

• Patients with psychiatric disorders 

• Patients treated for asymptomatic diseases (eg, hypercho¬ 
lesteremia, hypertension) 

Given the high prevalence of medication nonadherence, 
ask all patients, “Have you missed any pills in the past 
week?” Any patient who answers yes should be considered 
nonadherent (Table 15-5). When patients answer no, a neg¬ 
ative response to each of the Morisky questions makes it 
even more likely that the patient is adherent. Questionnaires 
about adherence may work better than clinical judgment. 


Table 15-5 Detecting the Likelihood of Medication Nonadherence 

LR+ (95% Cl) a LR- (95% Cl) b 

Single question: Have you missed any 
pills in the past week? 

4.3 (3.1-6.1) 

0.51 (0.44-0.58) 

Morisky questions (any one positive) 

1. Do you ever forget to take your 
medication? 

2. Are you careless at times about 
taking your medicine? 

3. When you feel better, do you some¬ 
times stop taking your medicine? 

4.Sometimes when you feel worse, 
do you stop taking your medicine? 

2.7 (1.6-4.4) 

0.36(0.18-0.64) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likeli¬ 
hood ratio. 

“The LR+ is the likelihood ratio for medication nonadherence. 

“The LR- is the likelihood ratio for medication adherence. 

REFERENCE STANDARD TESTS 

There is no single best reference standard for measuring 
adherence for all medications, nor is there general agree¬ 
ment for the level of adherence that is considered optimal. 
Physicians must use their best judgment, tailored to their 
knowledge of each patient. 
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EVIDENCE TO SUPPORT THE UPDATE: 


Compliance and Medication Adherence 



TITLE The Compliance Questionnaire Rheumatology 
Compared With Electronic Medication Event Monitoring: 
A Validation Study. 

AUTHORS de Klerk E, van der Eleijde D, Landewe R, 
van der Tempel H, van der Linden S. 

CITATION JRheumatol. 2003;30(ll):2469-2475. 

QUESTION Is the Compliance Questionnaire Rheuma¬ 
tology (CQR) a valid measure of adherence compared 
with a gold standard electronic Medication Event Moni¬ 
toring System (MEMS)? 

DESIGN Prospective study comparing questionnaire 
responses with a MEMS. 

SETTING Three outpatient referral centers for rheu¬ 
matology. 

PATIENTS Eighty -five patients who completed the ques¬ 
tionnaire and electronic monitoring data were available. 


MAIN OUTCOME MEASURES 

Sensitivity and specificity. 

MAIN RESULTS 

Twenty-nine (34%) patients were not completely adherent 

(see Table 15-6). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The CQR is a patient-oriented questionnaire 
that was designed to explore concepts related to patient 
adherence in antirheumatic drug regimens. The measure is 
easy to read and understand. Patients can complete the ques¬ 
tionnaire in their own environment; an interviewer is not 
required. It has good psychometric properties. 1 

LIMITATIONS The mean time to complete the questionnaire 
was 12 minutes. Approximately 20% of the sample had at 
least 1 missing value. Some of the questions are not applica¬ 
ble to all responders. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Prescriptions had to be first prescriptions (which is not the 
same as a prescription for a new diagnosis), and the instruc¬ 
tions had to be “take as directed” (not “on demand”). In a 
case in which 2 drugs were started at the same time, the drug 
monitored was chosen to be the drug group in which the few¬ 
est patients were enrolled. 

The development of the CQR has been described in detail. 1 
CQR consists of 19 items, which were derived from a series of 
patient interviews and a focus group interview, and reflects 
statements that were made by patients regarding their drug¬ 
taking behavior. Patients were asked to indicate how much they 
agree with each statement on a 4-point Likert scale, with anchors 
“don’t agree at all” (scored 1), “don’t agree” (scored 2), “agree” 
(scored 3), and “agree very much” (scored 4). The CQR total 
score is calculated by summing the items, subtracting 19, and 
dividing by 0.57. This ensures that the CQR total score can vary 
from 0 (complete nonadherence) to 100 (perfect adherence). 

The reference standard was MEMS. 


Table 15-6 Likelihood Ratios for the Compliance Questionnaire 
Rheumatology 

Sensitivity, Specificity, LR+ LR- 

Test % % (95% Cl) a (95% Cl) b 

Compliance 62 96 17(4.9-63) 0.39(0.23-0.58) 

Questionnaire 

Rheumatology 

(<80% on the 

scale) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likeli¬ 
hood ratio. 

“LR+ is the likelihood ratio for medication nonadherence. 

6 LR- is the likelihood ratio for medication adherence. 


REFERENCE FOR THE EVIDENCE 

1. de Klerk E, van der Heijde D, van der Tempel H, van der Linden S. Devel¬ 
opment of a questionnaire to investigate patient compliance with anti¬ 
rheumatic drug therapy. / Rheumatol. 1999;26(12):2635-2641. 

Reviewed by Hayden B. Bosworth, PhD 
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TITLE Compliance With Tricyclic Antidepressants: The 
Value of 4 Different Methods of Assessment. 

AUTHORS George CF, Peveler RC, Heiliger S, Thomp¬ 
son C. 

CITATION Br J Clin Pharmacol. 2000;50(2):166-171. 

QUESTION What are the advantages and disadvantages of 
the 4 methods for studying adherence with antidepressants? 

DESIGN As part of a larger randomized controlled trial, 
subjects were followed for up to 12 weeks after beginning 
to take antidepressants, but adherence was assessed at 6 
weeks. 

SETTING General practices. 

PATIENTS Eighty -three patients aged 18 years or older 
who were beginning antidepressant treatment 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

At a 6-week interview, subjects were asked the 4 standard 
questions described by Morisky et al 1 : 

1. Do you ever forget to take your medication? 

2. Are you careless at times about taking your medicine? 

3. When you feel better, do you sometimes stop taking your 
medicine? 

4. Sometimes when you feel worse, do you stop taking your 
medicine? 

A yes answer was scored as 1, and the sum of yes answers 
constitutes the score. A score of 0 suggests no problems with 
medicine taking, whereas the maximum of 4 could indicate 
major difficulties and suggests poor adherence. 

For each subject, antidepressant medication was dispensed 
in medication event monitoring system (MEMS) containers, 
sufficient for a period of 3 weeks. The MEMS cap contained a 
microprocessor that records the time the bottle is opened as a 
proxy for appropriate dosing. This was treated as the diagnos¬ 
tic standard, although the investigators also did pill counts. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity compared with the MEMS meas¬ 
ure at a threshold of 80%. 

MAIN RESULTS 

Among the subjects, 27 (32%) were nonadherent (see 

15-7). 

Pill counts indicated better adherence at the 80% pill count 
level (only 17 were nonadherent). However, the pill counts 
also found 29 additional patients who finished with more 
pills than were dispensed. 


Table 15-7 Likelihood Ratios of the Morisky Scale for Medication 
Adherence 

Sensitivity, Specificity, LR+ LR- 

Test % % (95% Cl) a (95% Cl) b 

Morisky scale 74 72 27(1.6-4.4) 0.36(0.18-0.64) 

(nonadherence 

>h 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likeli¬ 
hood ratio. 

“LR+ is the likehood ratio for medication nonadherence. 

6 LR- is the likehood ratio for medication adherence. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The Morisky scale is 1 of the more common self- 
reported measures of medication adherence. It has been used for 
multiple diseases and is easy and quick to administer. 

LIMITATIONS Depressed patients have a particular problem 
with adherence, especially as their symptoms improve. 

REFERENCE FOR THE EVIDENCE 

1. Morisky E, Green LW, Levine DM. Concurrent and predictive validity of 
a self-reported measure of medication adherence. Med Care. 1986;24(1): 
67-74. 

Reviewed by Hayden B. Bosworth, PhD 


TITLE Validation of a Simplified Medication Adherence 
Questionnaire in a Large Cohort of HIV-Infected Patients: 
The GEEMA Study. 

AUTHORS Knobel H, Alonso J, Casado JL, et al; for the 
GEEMA Study Group. 

CITATION AIDS. 2002;16(4):605-613. 

QUESTION How effective is the Simplified Medication 
Adherence Questionnaire (SMAQ) in identifying nonad¬ 
herent patients? 

DESIGN Prospective observational study of adherence. 

SETTING A publicly funded specialist human immu¬ 
nodeficiency virus (HIV) clinic where all treatment was 
free. 

PATIENTS A total of 40 HIV-seropositive adults who had 
commenced nelfinavir treatment in combination with other 
antiretroviral drugs. The study is a subset of a larger multi¬ 
center (69 hospitals) and nationwide (Spain) study. 
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DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A group of physicians, nurses, pharmacists, psychologists, and 
patients, all with experience with antiretroviral treatment and 
adherence, developed the SMAQ. The questionnaire was based 
on the Morisky scale. 1 The research group then made the follow¬ 
ing change: item 3 “When you feel better, do you sometimes 
stop taking your medicine?”) was eliminated because many 
HIV-infected patients are asymptomatic. Three additional ques¬ 
tions were incorporated, with the aim of obtaining more adher¬ 
ence-specific measurements. A modified version of a question 
used by Samet et al 2 to determine the number of missed doses 
during the previous 24 hours was used. The SMAQ result was 
considered “positive” when a positive response to any of the 
questions was provided. 

The criterion validity assessment was carried out in a sub¬ 
set of 40 patients. The patients were provided with a MEMS 
cap bottle for each pack of nelfinavir prescribed. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity of the SMAQ. 

MAIN RESULTS 

Among the patients, 18 (45%) were not adherent (see 

15-8). 

Table 15-8 Likelihood Ratios for the Simplified Medication Adherence 
Questionnaire 

Test Sensitivity, % Specificity, % LR+ (95% Cl) 3 LR- (95% Cl) b 
SMAQ 72 91 7.9(2.4-29) 0.31(0.14-0.58) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; SMAQ, Simplified Medication Adherence Questionnaire. 
a LR+ is the likehood ratio for medication nonadherence. 
b LR- is the likehood ratio for medication adherence. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The SMAQ showed a positive association to 
virologic outcome. The SMAQ s internal consistency and 
reproducibility were satisfactory and the measure is easy to 
implement. 

LIMITATIONS Like all self-report measures, the question¬ 
naire is limited by recall and social desirability bias. 

REFERENCES FOR THE EVIDENCE 

1. Morisky E, Green LW, Levine DM. Concurrent and predictive validity of a 
self-reported measure of medication adherence. Med Care. 1986;24(l):67-74. 

2. Samet JH, Libman H, Steger KA, et al. Compliance with zidovudine therapy 
in patients infected with human immunodeficiency virus, type 1: a cross- 
sectional study in a municipal hospital clinic. Am JMed. 1992;92(5):495-502. 
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TITLE Responses to a 1-Month Self-report on Adherence 
to Antiretroviral Therapy Are Consistent With Electronic 
Data and Virologic Treatment Outcome. 

AUTHORS Walsh JC, Mandalia S, Gazzard BG. 

CITATION AIDS. 2002;16(2):269-277. 

QUESTION Is the Medication Adherence Self-Report 
Inventory (MASRI) a valid measure of antiretroviral ther¬ 
apy compared with an objective measure of adherence? 

DESIGN Prospective study comparing questionnaires 
responses with medication event monitoring system 
(MEMS) (MEMS TrackCap), pill count, and plasma 
human immunodeficiency virus (HIV) viremia. 

SETTING A publicly funded specialist HIV clinic where 
all treatment was free. 

PATIENTS Seventy-eight HIV-seropositive adults receiv¬ 
ing stable combination antiretroviral therapy dispensed 
from the clinic’s pharmacy. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The MASRI consists of 12 items with 2 broad themes. The first 
section is related to the amount of medication actually taken. 
The second part of the MASRI addressed the timing of doses. 
Both 3-day and 2-week self-report assessments were used. 

For each subject, the antiretroviral drug in the combination 
that presented the greatest barrier to adherence was selected (eg, 
higher pill burdens, dietary requirements, more frequent dose 
intervals). Subjects were a given a bottle containing this drug, 
closed with a MEMS cap. These are pill bottle caps containing a 
microprocessor that records the time the bottle is opened as a 
presumptive dose. This was treated as the diagnostic standard. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity. 

MAIN RESULTS 

See Table 15-9. 


Table 15-9 Likelihood Ratios for the Medication Adherence Self- 
Report Inventory 

Sensitivity, 3 Specificity, 3 LR+ LR- 
Test % % (95% Cl) b (95%CI) C 


MASRI (2 wk before <80% 66 100 33 0.34 

level of adherence) (4-317) (0.23-0.47) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; MASRI, Medication Adherence Self-Report Inventory. 

“Results transformed from data in manuscript so that the LR+ is associated with an 
abnormal MASRI and increases the probability of nonadherence. The LR- would be 
a normal result on the MASRI and decreases the likelihood of nonadherence (ie, the 
patient is adherent). 

b LR+ is the likehood ratio for medication nonadherence. 

C LR- is the likehood ratio for medication adherence. 
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CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The MASRI is one of the first adherence ques¬ 
tionnaires for antiretroviral therapy to have been validated 
against an objective measure. 


LIMITATIONS Study sample selected had higher adherence 
than that typically observed in the literature; subjects who 
admitted to deviating from instructions were excluded from 
analysis. 

Reviewed by Hayden B. Bosworth, PhD 
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CHAPTER 


CLINICAL SCENARIOS 


Can the Clinical 
Examination Diagnose 
Left-Sided 

Heart Failure 

in Adults? 

Robert G. Badgett, MD 
Cynthia D. Mulrow, MD, MSc 
Catherine R. Lucey, MD 


CASE 1 Your first patient is a 65-year-old man with 
Canadian class II angina and hypertension. He takes daily 
aspirin, sublingual nitroglycerin, and a calcium-channel 
blocker. His examination findings are normal, but his 
electrocardiogram (ECG) shows inferolateral Q waves, 
and the chest radiograph shows cardiomegaly. 

CASE 2 Your second case patient is an obese, 70-year-old 
woman who has had dyspnea on exertion and fatigue for 3 
months. She reports no orthopnea or paroxysmal noctur¬ 
nal dyspnea. Her medical history reveals 40 pack-years of 
smoking, poorly controlled chronic hypertension, and type 
2 diabetes mellitus. She has a blood pressure of 180/100 
mm Hg, a sustained apical impulse, bilateral rales, and 
moderate pretibial edema. Her complete blood cell count 
and basic chemistry results are normal. Her ECG shows left 
ventricular hypertrophy with strain. The chest radiograph 
reveals normal heart size and enlarged upper lobe vessels. 

CASE 3 Your last case patient is a 58-year-old woman 
with idiopathic dilated cardiomyopathy. Cardiac catheter¬ 
ization showed normal coronary artery and a left ventric¬ 
ular ejection fraction (EF) of 35%. She has done well for a 
year but now complains of dyspnea on exertion despite 
treatment with diuretics, digoxin, and an angiotensin¬ 
converting enzyme (ACE) inhibitor. You find a displaced 
apical impulse, soft apical third heart sound, and clear 
lung fields. Her ECG and chest radiograph results are 
unchanged from before and show nonspecific ST changes 
and cardiomegaly, respectively. 


WHY IS THE DIAGNOSIS IMPORTANT? 


Clinicians seeing patients similar to case patient 1 must rec¬ 
ognize that a reduced left ventricular EF can exist even when 
there is no fluid overload. The first patient, even if asymp¬ 
tomatic, should be treated with an ACE inhibitor if a previ¬ 
ous infarction significantly reduced the EF. 1 A reduced EF 
also may suggest a need for coronary angiography to evaluate 
for possible revascularization. 2 

Case 2 presents a number of different diagnostic and thera¬ 
peutic possibilities. The decision to pursue diagnostic testing 
for pulmonary, cardiac, or other causes of dyspnea rests with 
the clinician’s ability to identify and interpret clinical findings. 
Knowledge of the accuracy of cardiac and pulmonary findings 
is essential. If an increased left ventricular filling pressure is 
detected, identifying the underlying pathophysiology is criti¬ 
cal. Systolic and diastolic dysfunction have different causes 
that require different diagnostic considerations and treat¬ 
ment. 3,4 Previous articles address pulmonary findings 5 and 
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distinction of cardiac and pulmonary causes of dyspnea. 6 We 
focus on cardiac findings. 

Case 3 with known cardiomyopathy introduces another 
diagnostic dilemma. When is the left-sided heart filling pres¬ 
sure adequately decreased? The primary goals of treatment 
are improved survival and functional status. Clinical findings 
are not used to titrate therapy aimed at improved survival 
(ACE inhibitors); however, clinical examination is used to 
decide whether a patient needs more diuresis or afterload 
reduction to improve functional status. 7,8 If clinical examina¬ 
tion is inaccurate, the potential for undertreatment or over¬ 
treatment of patients with congestive symptoms exists. 

PATHOPHYSIOLOGY AND DEFINITIONS 

The physiologic definition of heart failure seems precise: “the 
pathophysiological state in which an abnormality of cardiac 
function is responsible for failure of the heart to pump blood 
at a rate commensurate with the requirements of the metab¬ 
olizing tissues, or to do so only from an elevated filling 
pressure.” 9 In clinical practice, this definition includes a het¬ 
erogeneous population of patients with varying underlying 
pathophysiologies for which there is no criterion standard 
(gold standard) test. 

An alternative, clinically meaningful way to define left¬ 
sided heart failure is a decreased left ventricular EF or in¬ 
creased filling pressure. Patients with left-sided heart failure 
then fall into one of 3 groups: decreased EF with normal fill¬ 
ing pressure, decreased EF with increased filling pressure, or 
normal EF with increased filling pressure. EF is easily meas¬ 
ured, accurately identifies persons with systolic dysfunction, 
and has well-described treatment and prognostic implica¬ 
tions. Filling pressure has diagnostic and therapeutic impli¬ 
cations. As the failing heart adapts by increasing left 
ventricular filling pressure to augment cardiac output, in¬ 
creased filling pressure must indicate myocardial dysfunc¬ 
tion. Thus, when the filling pressure is increased but the EF is 
normal, the patient has diastolic dysfunction. 3 These rela¬ 
tions hold if the clinician has excluded other causes of in¬ 
creased filling pressure such as intermittent ischemia, 
valvular and pericardial disease, and high output states. 10,11 In 
addition, an increased filling pressure correlates with in¬ 
creased symptoms and edema, 12 even among patients with 
severe systolic dysfunction, 1315 that are reduced by diuretic or 
vasodilator therapy. 7,8,16 

METHODS 

Literature Search 

We searched English-language medical literature regarding 
the clinical examination in heart failure, with 3 goals in 
mind: (1) to identify the most discriminating and useful 
clinical findings; (2) to estimate the utility of the overall clin¬ 
ical examination; and (3) to describe characteristics of 
patients or clinical settings when disease can be ruled out or 
confirmed. All studies we reviewed examined the ability of 


clinical findings or the overall clinical examination to predict 
filling pressure or EF. Acceptable criterion standards for fill¬ 
ing pressure were left ventricular end-diastolic pressure, left 
atrial pressure, pulmonary capillary wedge pressure, or pul¬ 
monary artery diastolic pressure. Finally, we sought studies 
that compared multiple clinical findings with a multivariate 
analysis. 

To develop a structured search strategy, we used pertinent 
articles already in our files and 2 related critical reviews that 
had used extensive search methods. 5,17 We then searched 
MEDLINE (English language) from January 1986 to Novem¬ 
ber 1995 with the developed structured search strategy that 
required certain words in the title or abstract (strategy avail¬ 
able on request). This search yielded 1254 articles, of which 
28 met inclusion criteria (Table 16-1). We excluded 3 addi¬ 
tional studies because the independent significance of cardiac 
findings was not assessed with a multivariate analysis. 51 ' 53 Be¬ 
cause only 2 articles addressed the distinction of systolic and 
diastolic dysfunction, 40,41 we also included 9 studies of distin¬ 
guishing systolic and diastolic dysfunction that met all inclu¬ 
sion criteria other than having a multivariate analysis. 42 ' 50 
Excepting the lack of multivariate analysis, these studies have 
quality levels similar to the studies we reviewed of diagnosing 
a reduced EF or increased filling pressure. 

Data Abstraction 

Two of us (R.G.B. and C.R.L. or C.D.M.) independently 
reviewed all studies. We calculated sensitivities and specificities 
and tests of significance for studies that did not provide those 
results. If necessary, data were reconstructed from scattergrams 
and graphs. For studies of systolic function, we made calcula¬ 
tions for an EF of less than 40% when possible. The quality level 
of evidence provided by each article was adapted from previous 
work. 5 Levels 1 and 2 had independent comparison of clinical 
examination items with a suitable criterion standard among 
consecutive or random patients. Level 1 studies were larger and 
had at least 96 patients with and without a normal criterion 
standard (this number assures confidence intervals < +10%). 
Level 3 studies had independent comparison of findings to a cri¬ 
terion standard among patients who were not consecutively or 
randomly chosen. Level 4 studies did not have independent (or 
the use of blinding not stated) comparison of findings to a crite¬ 
rion standard. 

To determine the utility of the clinical examination, studies 
were pooled with a random-effects model 54 for sensitivities, 
specificities, and likelihood ratios (LRs). When possible, we 
stratified the predicted probabilities of disease into 3 levels: low, 
intermediate, and high risk. For studies that compare a clinicaEy 
predicted EF with a measured EF, 29 ' 32,37,38 we stratified risk of dis¬ 
ease according to the predicted EF: low risk, predicted EF of 
60% or greater; intermediate risk, predicted EF of 31% to 59%; 
and high risk, predicted EF of 30% or lower. For the study by 
McNamara et al, 37 low-probability patients had no abnormal 
findings, intermediate patients had 1 to 2 abnormal findings, 
and high-probability patients had 3 or 4 abnormal findings. We 
then pooled the studies to calculate multilevel LRs 55 and com¬ 
pare the true prevalences of disease in each risk stratum. We 
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Table 16-1 Summary of Studies Reviewed 



Source, y 

Population 

Gold Standard 

Studies of Increased Filling Pressure 

Harlan etal, 12 1977 

1306 Patients (in validation group) with known coronary disease 

LVEDP >15 mm Hg 

Carlson etal, 18 1985 

96 Patients who received elective right-sided heart catheterization 

PCWP > 12 mm Hg 

Forrester et al, 19 ’ 20 1977,1976 

188 Consecutive patients peri-infarction 

PCWP > 15 mm Hg 

Fein etal, 21 1984 

70 Consecutive ICU patients with pulmonary edema 

PCWP >18 mm Hg 

Tuchschmidt etal, 22 1987 

35 ICU patients needing right-sided heart catheterization 

PCWP > 15 mm Hg 

Eisenberg etal, 23 1984 

97 ICU patients, without recent infarction, needing right-sided heart catheterization 

PCWP > 15 mm Hg 

Connors et al, 24 1983 

62 ICU patients, without recent infarction, needing right-sided heart catheterization 

PCWP >12 mm Hg 

Connors et al, 25 1990 

502 ICU patients needing right-sided heart catheterization 

PCWP >18 mm Hg 

Steingrub et al, 26 1991; Celoria et al, 27 1990“ 

154 ICU patients needing right-sided heart catheterization 

PCWP > 18 mm Hg 

Butman etal, 14 1993 

52 Patients with mean EF of <20% undergoing pretransplant evaluation 

PCWP > 18 mm Hg 

Chakkoet al, 13 1991 

52 Patients with mean EF of 19% undergoing pretransplantation evaluation 

PCWP > 15 mm Hg 

Stevenson and Perloff, 15 1989 

50 Patients with mean EF of 18% undergoing pretransplantation evaluation 

PCWP >22 mm Hg 

Studies of Systolic Dysfunction 

Rihal etal, 28 1995 

14507 Patients enrolled in the Coronary Artery Surgery Study 

EF < 50% 

Eagle etal, 29 1988 

222 Patients in 2 groups electively referred for MUGA 

EF < 50% 

Mattleman etal, 30 1983 

199 Elective referrals for MUGA in patients with coronary disease 

EF < 50%, EF < 30% 

Ostojic et al, 31 1989 

238 Patients in 2 groups who received cardiac catheterization 

EF < 50% 

Cease and Nicklas, 32 1986 

105 Patients in 2 groups referred for MUGA for various reasons 

EF < 50% 

Gadsboll et al, 3334 1989 

98 Patients who received MUGA 7-15 d after infarction 

EF < 52% 

Jain et al, 35 1993 

32 Patients who received echocardiogram 15-25 d after infarction 

EF < 40% 

Mangschau etal, 36 1986 

477 Patients who received MUGA 8-12 d after infarction 

EF < 50% 

McNamara etal, 37 1988 

760 Patients who received MUGA 6-24 d after infarction 

EF < 40% 

Sanford etal, 38 1982 

100 Patients who received MUGA after infarction 

EF < 50% 

Silver etal, 39 1994 

304 Patients in 2 groups who received MUGA, echocardiogram, or catheterization 
2-21 d after infarction 

EF < 50% 

Studies of Diastolic Dysfunction 

Ghaliet al, 40 1991 

82 Consecutive patients admitted for CHF 

FS > 24% 

McDermott et al, 41 1995 

298 Consecutive patients admitted for syndrome of CHF 

EF > 50% 

Aguirre et al, 42 1989 

151 Patients with 2 signs of CHF who were referred for echocardiogram 

EF > 55% 

Aronow etal, 43 1990 

247 Elderly residents of a long-term care facility with clinical criteria of CHF 

EF > 50% 

Bier etal, 44 1988 

87 Consecutive inpatients with pulmonary edema 

Normal wall motion by 
echocardiogram 

Cocchi et al, 45 1991 

118 Consecutive elderly patients on a geriatrics service with clinical criteria of CHF 

EF > 50% 

Cohn etal, 48 1990 

623 Male veterans who met criteria to be in the V-HeFT Study 

EF > 45% 

Dougherty et al, 47 1984 

72 Consecutive patients with clinical CHF referred for gated radionuclide 
ventriculography 

EF < 45% 

Echeverria etal, 48 1983 

50 Consecutive referrals for echocardiograms because of CHF 

EF > 50% 

Takarada etal, 49 1992 

172 Consecutive elderly patients admitted for CHF 

FS > 30% 

Wong et al, 50 1989 

54 Elderly patients admitted for CHF who were referred for echocardiogram 

Normal wall motion by 
echocardiogram 


Abbreviations: CHF, congestive heart failure; EF, ejection fraction; FS, fractional shortening; ICU, intensive care unit; LVEDP, left ventricular end diastolic pressure; MUGA, multi¬ 
gated angiography; PCWP, pulmonary capillary wedge pressure; V-HeFT, Vasodilator Heart Failure Trial. 

“There is partial overlap among the patients in the studies by Steingrub et al 2B and Celoria et al. 27 


excluded 1 study, although the published data allowed stratify¬ 
ing the predicted probabilities. 33 This study had outlying results 
and was the only study not to incorporate the chest radiograph 
into the clinical assessment. 

To determine the best clinical findings, we tabulated how 
often a particular finding was studied and how often it had 


univariate or multivariate significance. These tables are avail¬ 
able on request. “Very helpful” findings have been studied at 
least twice and have either univariate or multivariate signifi¬ 
cance every time studied. “Somewhat helpful” findings for 
increased filling pressure or diastolic dysfunction are signifi¬ 
cant at least half the times they were studied. As many findings 










































CHAPTER 16 The Rational Clinical Examination 


are associated with systolic dysfunction, somewhat helpful 
findings are restricted to those significant more than half the 
times studied. Findings that are “helpful only when present” 
are those that are not usually statistically significant but are 
usually reported as having a specificity of at least 90%. We 
believe these findings are clinically significant when present. 

Our last goal was to describe when disease could be ruled 
out or confirmed by the clinical examination. We used studies 
that successfully describe either low-probability or high-prob¬ 
ability patients (positive or negative predictive value > 90%). 
With these studies, we used their decision aids, prediction 
rules, or multivariate equations to estimate the number of 
abnormal clinical findings that would place a patient in each 
level of risk (Figures 16-1 and 16-2). 

RESULTS 

How to Detect Increased Left Ventricular Filling Pressure 

Although clinicians routinely assess filling pressure in 
patients similar to those in cases 2 and 3, there is little litera¬ 
ture on our ability to do so. Four studies 1215 assess whether 


multiple clinical findings identify patients with invasively 
determined increased left ventricular filling pressure. Three 
of these studies 1315 involve patients with known severe sys¬ 
tolic dysfunction (mean EF < 20%) who are referred for pre¬ 
transplant evaluation (Table 16-1). These studies are biased 
by a high prevalence of increased filling pressure. 

Unfortunately, isolated clinical findings alone have a lim¬ 
ited role in diagnosis. Very helpful findings are radiographic 
redistribution and jugular venous distention (Table 16-2). 
These findings, when used alone, only help when they are 
abnormal and so can confirm the presence of increased fill¬ 
ing pressure in patients with known severe systolic dysfunc¬ 
tion. Among patients referred for consideration of cardiac 
transplant with a high (73%) prevalence of increased filling 
pressure, 13-15 radiographic redistribution indicates an 80% 13 
to 90% 14 probability and jugular venous distention, an 85% 13 
to 100% 15 probability of increased filling pressure. The 
absence of either finding cannot rule out increased filling 
pressure. In patients with lesser probabilities of increased fill¬ 
ing pressure, such as those without known severe systolic 
dysfunction, isolated findings may not be useful. Somewhat 
helpful findings include dyspnea and abnormal vital signs 


Known severe systolic dysfunction? 



—__Yes 

Prevalence of increased 


Prevalence of increased 

filling pressure is 20% 12,18 


filling pressure is 75% 13-15 


< 1 Finding 

2 Findings 

> 3 Findings 3 



Low (< 10%) 

Indeterminate 

High (>90%) 


0 Findings 

> 1 Finding 13 


Low (< 10%) 

High (>90%) 


Figure 16-1 Suggested Algorithm to Determine the Risk of Increased Left Ventricular Filling Pressure 
Using Findings From Appropriate Column of Table 16-2 

“Recommendation extrapolated from Carlson et al, 18 who found that the presence of 2 findings did not reliably indicate an increased filling pressure. 

b At least 1 abnormal very helpful finding indicates an 80%-100% probability of increased filling pressure; the role of less helpful findings has not been quantified. 



Figure 16-2 Suggested Algorithm to Determine Risk of Decreased Lett Ventricular Ejection Fraction 
Using Findings From Appropriate Column of Table 16-2 

“No abnormal findings in patients with a history of hypertension indicates low risk. In patients without hypertension, a decreased ejection fraction cannot be 
ruled out; however, other studies 35 ' 37 suggest that a history of hypertension does not affect the diagnosis. Unfortunately, these studies use regression equations 
that cannot be translated into number of abnormal findings. 

b McNamara et al 37 report that at least 3 abnormal findings indicate only 83% probability of an ejection fraction less than 40%; however, the role of cardiomegaly 
was not studied. 
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Table 16-2 Helpful Clinical Findings for the Detection of Heart Failure 



Increased Filling Pressure 

Ejection Fraction < 40% 

Diastolic Dysfunction 

Very helpful findings 4 

Radiographic redistribution, jugular venous 
distention 

Radiographic cardiomegaly, 8 or redistribu¬ 
tion, anterior Q waves, left bundle-branch 
block, abnormal apical impulse 

Current hypertension 

Somewhat helpful findings 4 

Dyspnea , 4 orthopnea, tachycardia, e low SBP 4 
PPP < 25%, S 3 , rales, abnormal abdomino¬ 
jugular reflux, radiographic cardiomegaly 1 

Pulse > 90/min 29 - 37 or > 100/min , 35 - 36 SBP 
< 90 29 mmHg, PPP < 33 %, 35 S 3 , rales, 
dyspnea, any previous infarction, CPK 
> 200 35 or > 1000 37 IU 

Obesity , 4 no tachycardia , 4 elderly , 4 
no smoking, no coronary disease 

Findings helpful only when 
present 8 

Edema 

Jugular venous distention, edema 

Normal radiographic heart size 


Abbreviations: CPK, creatinine phosphokinase in the postinfarction patient; PPP, proportional pulse pressure (pulse pressure/systolic pressure); SBP, systolic blood pressure. 
“Very helpful findings are significant in all studies and have been studied at least twice. Bolded findings are always independently significant. 

“Current hypertension is systolic pressure higher than 160 mm Hg 44 or diastolic pressure higher than 100 mm Hg 44 or higher than 105 mm Hg. 4D 
“Somewhat helpful findings are significant in at least half of studies. For systolic dysfunction, only findings significant in more than half of studies are listed. 

“Studied only once. 

“No cutoff to define abnormal. 

'Cardiomegaly is somewhat helpful in initially detecting increased filling pressure; however, cardiomegaly may remain after reduction of the filling pressure. 

“Findings helpful only when present are consistently reported as highly specific, although they are not usually statistically significant. 


(Table 16-2). Radiographic cardiomegaly is somewhat 
helpful 1214 but loses its specificity after the initial detection of 
increased filling pressure because it can be a permanent find¬ 
ing and not fluctuate with changes in filling pressure. Depen¬ 
dent edema is helpful only when present. Edema is highly 
specific for increased filling pressure, although it has poor sen¬ 
sitivity. Limiting our review to studies meeting our inclusion 
criteria, history of infarction 13 was not helpful. Clinicians may 
use other findings not listed in Table 16-2; however, their inde¬ 
pendent role in the assessment of increased filling pressure is 
unknown. 

If isolated findings alone are not helpful, can multiple find¬ 
ings in combination or the overall clinical examination rule 
out or confirm increased filling pressure? A total of 11 studies 
address this question (Table 16-3). 12 - 18 ' 27 Their pooled operat¬ 
ing characteristics do not yield predictive values that reliably 
confirm or rule out an increased filling pressure in typical 
patients, such as those in the emergency department or hos¬ 
pital because of dyspnea. 56 ' 58 

If the overall examination cannot successfully dichotomize 
patients into those with either normal or increased filling 
pressure, can the examination place patients into 3 groups, 
those whose filling pressure is increased, indeterminate, or 
normal? 59 - 60 If so then clinicians could pursue alternative 
diagnoses in patients highly likely to have a normal filling 
pressure while initiating treatment in patients highly likely to 
have increased filling pressure. Although no study has for¬ 
mally evaluated the utility of this approach, we suggest that 
the probability of increased filling pressure is related to the 
number of findings that are detected on clinical examination. 

The number of findings associated with low, intermediate, 
or high probability of increased filling pressure depends on the 
clinical setting (Figure 16-1). For example, patients without 
known severe systolic dysfunction have a low prevalence 
(22%) of increased filling pressure. 12 - 18 Patients likely to have 
normal filling pressure in this setting are reported by Carlson 
et al 18 to have no more than 1 finding (negative likelihood ratio 


[LR-], 0.1). Extrapolating from the results of Carlson et al, 18 in 
which only 73% of patients with at least 2 findings had 
increased filling pressure, patients highly likely to have 
increased filling pressure will have at least 3 abnormal findings. 
In patients with known severe systolic dysfunction, the preva¬ 
lence of increased filling pressure is higher (73%) 1315 and easier 
to confirm but harder to rule out. Patients likely to have a nor¬ 
mal filling pressure in this setting will have no abnormal find¬ 
ings. 14 Increased filling pressure in patients with known severe 
systolic dysfunction is likely if there is a single very helpful 
finding such as redistribution or jugular venous distention. 

In summary, in populations without known severe systolic 
dysfunction, patients with no more than 1 abnormal finding 
likely have a normal filling pressure, whereas those with at 
least 3 abnormal findings may have an increased filling pres¬ 
sure. Among populations with known severe systolic dys¬ 
function, patients with no abnormal findings likely have a 
normal filling pressure, whereas those with 1 very helpful 
finding likely have increased filling pressure. Patients in 
either setting with an intermediate number of findings will 
have an indeterminate filling pressure. These conclusions are 
based on a limited number of small studies. Future research 
needs to confirm and refine these conclusions. 

How to Detect Decreased EF 

Although detection of decreased EF is better described than 
detection of increased filling pressure, 28 ' 35 isolated findings 
have an even smaller role in detecting a decreased EF. Five 
findings are very helpful in identifying patients with an EF of 
less than 40% (Table 16-2). Radiographic cardiomegaly con¬ 
sistently adds independent information in predicting de¬ 
creased EF. However, its sensitivity and specificity (51% and 
79%, respectively) are insufficient to help the clinician in 
most clinical settings. 17 Abnormal apical impulse (especially 
sustained duration), radiographic redistribution, and ante¬ 
rior Q waves or left bundle-branch block on ECG are also 
consistent predictors, although they do not consistently add 
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Table 16-3 Performance of the Clinical Evaluation for Detecting an Increased Filling Pressure or a Decreased Ejection Fraction 



Source 

Prevalence of 
Level of Evidence Disease, % 

Sensitivity, % 

Specificity, % 

LR+ 

LR- 


Increased Filling Pressure 

Patients referred for elective evaluation 

Harlan et al 12 

1 

21 

52 

85 

3.5 

0.6 


Carlson et al 18 

3 

31 

90 

85 

5.9 

0.1 


Postinfarction patients 

Forrester et al 19 

2 

64 

85 

85 

5.7 

0.2 


ICU patients 

Fein et al 21 

2 

49 

91 

47 

2.0 

0.2 


Eisenberg et al 23 

3 

67 

57 

35 

1.0 

1.2 


Tuchschmidt and Sharma 22 

3 

55 

53 

94 

8.8 

0.5 


Connors et al 24 

3 

61 

51 

67 

2.0 

0.7 


Connors et al 25 

3 

45 

50 

78 

2.0 

0.6 


Steingrub et al 26 

3 

31 

35 

85 

3.0 

0.7 


Summary 



54 

69 

1.7 

0.7 


Patients referred for pretransplant evaluation 

Stevenson and Perloff 15 

2 

86 

58 

100 

a 

0.4 


Summary 



62 

76 

2.6 

0.5 


Decreased Ejection Fraction 

Patients referred for elective evaluation 

Rihal et al 28 

1 

23 

68 

74 

2.6 

0.4 


Eagle et al 29b 

2 

44 

96 

48 

1.8 

0.1 


Eagle et al 29c 

2 

45 

90 

58 

2.1 

0.2 


Mattleman et al 30b 

2 

38 

97 

66 

2.9 

0.1 


Mattleman et al 30c 

2 

42 

88 

74 

3.4 

0.2 


Ostojic et al 31d 

2 

45 

92 

68 

5.7 

0.1 


Ostojic et al 31e 

2 

23 

83 

46 

2.0 

0.4 


Cease and Nicklas 328 

4 

42 

96 

65 

2.6 

0.2 


Cease and Nicklas 326 

4 

39 

100 

56 

2.5 

0.1 


Summary 



91 

62 

2.4 

0.1 


Postinfarction patients 

Gadsboll et al 33 

1 

43 

72 

76 

1.2 

0.4 


Mangschau et al 36 

1 

24 

67 

83 

8.5 

0.7 


McNamara et al 37 

1 

35 

67 

77 

3.8 

0.4 


Sanford et al 38 

3 

25 

100 

42 

1.7 

0.3 


Silver et al 39d 

3 

31 

98 

57 

1.7 

0.1 


Silver et al 39b 

3 

25 

97 

65 

1.5 

0.1 


Jain et al 35 

4 

59 

95 

85 

6.2 

0.1 


Summary 



80 

70 

2.7 

0.3 


Summary 



85 

66 

2.5 

0.2 



Abbreviations: ICU, intensive care unit; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Ellipses indicate date not available. 

“Results of a cardiologist or other physician. 

“Results of a prediction rule. 

“Results from a training set. 

“Results from a validation set. 


independent information. Anterior Q waves and left bundle- 
branch block both have a specificity of almost 90% or 
higher. 28,39 Other Q waves increase the sensitivity of the ECG 
but are less specific. 20,40 A single study on predominantly is¬ 


chemic patients reported that the presence of any electrocar¬ 
diographic abnormality has a sensitivity of 90%. 28 This 
sensitivity may decline in other populations. Many other 
findings are somewhat helpful (Table 16-2). Two findings as- 
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sociated with increased filling pressure, edema and elevated 
jugular venous pressure, are helpful only when present. In 2 
studies of postinfarction patients, these 2 findings are highly 
specific for decreased EF. 33 - 37 However, this specificity will de¬ 
cease in populations with less ischemic heart disease and a 
higher prevalence of increased filling pressure with normal 
EFs (diastolic dysfunction). Other findings examined that 
were not significant in a majority of studies are age, 28,35,37,39 or¬ 
thopnea, 38 left ventricular hypertrophy on ECG, 29 and a his¬ 
tory of hypertension 29 - 35 - 37 - 39 or congestive heart failure. 29 - 39 
How should the clinician use this information? Unfortunately, 
the pooled sensitivity and specificity of the overall clinical exami¬ 
nation do not yield high enough predictive values to reliably 
assess the EF (Table 16-3). However, the clinical examination can 
categorize patients into low, indeterminate, and high probability 
of systolic dysfunction. Eight studies describe patients who 
have either a very low (<10%) or a very high (>90%) probability 
of systolic dysfunction (Figure 16-2). 29 ' 32 - 35 - 37 ' 39 Low-probability 
patients have none of the abnormal clinical findings associated 
with a decreased EF, high-probability patients have at least 3 and 
usually more findings, and indeterminate patients have an inter¬ 
mediate number of abnormal findings (1 to 2). (Although spe¬ 
cific findings used in each study vary, we recommend that the 
clinician use those for ejection fraction < 40% in Table 16-2.) 
The probabilities of an EF of less than 40% in the low-probabil¬ 
ity, indeterminate, and high-probability categories according to 
clinical findings are 7% (range, 0%-10%), 34% (range, 23%- 
41%), and 89% (range, 86%-100%), respectively. 33 ' 36 - 39 - 40 
Typical outpatient populations probably have lower preva¬ 
lences of decreased EFs than patients who have been included 
in the above-cited studies. Thus, outpatients categorized as low 
risk are likely to have less than a 7% probability of a low EF. 
Recent guidelines also suggest that diagnosing a noncardiac 
explanation for a patient's symptoms can help decrease the 
probability of systolic dysfunction. 61 

How to Distinguish Diastolic From Systolic Dysfunction 

Studies addressing the distinction of diastolic from systolic dys¬ 
function do so by predicting EF in patients (usually inpatients) 
with clinical evidence of increased filling pressure (patients with 
increased filling pressure and normal EF are assumed to have dia¬ 
stolic dysfunction). Only 2 studies 40 - 41 use a multivariate analysis 
to report the independent information from each clinical finding 
(Table 16-1). Thus, we also reviewed studies that report the per¬ 
formance of multiple clinical findings but do not use a multivari¬ 
ate analysis to compare independent values of findings. 42 ' 50 

The only very helpful finding is currently elevated blood 
pressure (Table 16-2). Its sensitivity ranges from 61% 40 to 
66% 44 and its specificity is 59% 44 to 70%. 40 Thus, its value as 
an isolated finding for identifying the EF among patients 
with increased filling pressure is questionable. Somewhat 
helpful findings are obesity, the absence of tachycardia, older 
age (no cutoff age is available), and absence of smoking or 
coronary disease. A normal heart size on the chest radio¬ 
graph is helpful only when present. A normal heart size is 
highly specific for diastolic dysfunction as the underlying 
cause of increased filling pressure. However, because 56% to 


75% of patients with diastolic dysfunction have left ventricu¬ 
lar hypertrophy 42 - 44 - 47 that can cause radiographic cardiomeg- 
aly, a normal heart size is not a common (sensitive) finding 
among patients with diastolic dysfunction. Neither electro¬ 
cardiographic evidence of left ventricular hypertrophy 40 - 47 nor 
a history of hypertension 40 - 42 ' 45 - 48 ' 50 discriminates diastolic 
from systolic dysfunction. In addition, the patient's sex and 
the presence of a third or fourth heart sound are not helpful. 

Little information exists about whether the clinical examina¬ 
tion or multiple findings in combination can distinguish dia¬ 
stolic from systolic dysfunction in patients with increased filling 
pressure. One study suggests multiple findings in combination 
have 76% accuracy. 41 In practice, this accuracy may be higher 
because this study did not analyze the role of the current blood 
pressure. Until more research is available, we recommend all 
patients with evidence of increased filling pressure have objec¬ 
tive assessment of their EF. Some patients with signs suggesting 
an increased filling pressure with a normal EF will have causes 
other than diastolic dysfunction. 44 - 48 These causes include valvu- 
lopathy, right ventricular dysfunction from emphysema, iatro¬ 
genic volume overload, pulmonary fibrosis, and intermittent 
left ventricular ischemia. The clinician should consider these 
diagnoses before diagnosing diastolic dysfunction. 

Precision of Clinical Findings 

Much variability exists in reports of the precision of clinical 
findings. This reflects the subtle nature of findings and the 
varied abilities of clinicians. Most studies, 31 - 62 - 63 but not all, 64 
suggest this variability is partly attributable to subspecialty 
training or examiner experience. 

Two studies report precision of multiple clinical findings in 
an overall bedside evaluation aimed at predicting EFs. (We 
report precision using standard qualitative descriptors of the K 
statistic for interobserver agreement. 65 ) For the overall bedside 
estimate of EF, Gadsboll et al 33 report “fair” precision (k = 0.28- 
0.37). In comparing 3 examiners, Gadsboll et al 33 find that the 
cardiologist tends to more accurately predict EF than the 2 resi¬ 
dent physicians. When assessing specific clinical findings, preci¬ 
sion is as follows: jugular venous distention, fair to substantial 
(k = 0.31-0.69) 34 - 14 ; displaced apical impulse, moderate to sub¬ 
stantial (k = 0.53-0.73) 34 ; third heart sound, slight to moderate 
(k = 0.14-0.60) 34 - 14 ; rales, slight to substantial (k = 0.12-0.65) 34 - 14 ; 
and edema, fair to substantial (k = 0.27-0.64). 34 For radio- 
graphic findings interpreted by radiologists, precision is as fol¬ 
lows: cardiomegaly, moderate (k = 0.48 ) 14 ; redistribution, fair 
to moderate (k = 0.38-0.50) 14 - 17 ; and interstitial edema, moder¬ 
ate to almost perfect (k = 0.56-0.83). 14 - 17 These results suggest 
that more experienced clinicians are more precise and, presum¬ 
ably, more accurate examiners. 

THE ELICITATION OF SELECTED 
SIGNS OF HEART FAILURE 

Vital Signs 

Details on how to measure blood pressure are reviewed in 
another article. 66 A pulse rate faster than 90 29 - 37 or 100/min 35 - 38 
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may indicate reduced EF. A systolic pressure lower than 90 
mm Hg is associated with a reduced EF, 29 whereas a diastolic 
pressure higher than 105 mm Hg 40 or an overall blood pres¬ 
sure of 160/100 mm Hg 44 or higher may indicate diastolic 
dysfunction. Tachycardia and low systolic pressure are also 
associated with increased filling pressure 12 ; however, no cut¬ 
offs are available. A proportional pulse pressure (the differ¬ 
ence between systolic and diastolic pressures divided by the 
systolic pressure) less than 33% is associated with a decreased 
EF 35 and less than 25% is associated with a reduced cardiac 
index. 15 

Several studies suggest the bedside assessment of the blood 
pressure response to the Valsalva maneuver may be one of the 
best predictors of both systolic dysfunction and increased 
filling pressure. 51,52 However, more research is needed because 
the independent contribution of this maneuver to the cardiac 
examination has not been assessed. 

Jugular Veins 

The jugular venous pulsation best correlates with right-sided 
heart filling pressure. Because the right atrial pressure can be 
elevated from left-sided heart disease, jugular venous disten¬ 
tion also correlates with left-sided heart filling pressure 1215 
and left ventricular EF. 33 The method of assessing the jugular 
venous pressure and the abdominojugular reflux is detailed 
in another article. 67 

Apical Impulse 

Abnormalities of the location, size, or duration of the apical 
impulse best correlate with increased left ventricular mass. 53,68 
Although both the location 20 - 33 and the duration 29 of the api¬ 
cal impulse are significantly associated with a reduced EF, 
only a sustained impulse independently adds to predicting 
the EF. 29 The normal apical impulse is located in the fourth 
or fifth intercostal spaces and is a brief tap. It is palpable in 
less than half of supine patients. 69,70 The 45-degree left lateral 
decubitus position increases the yield, 53 - 69 - 70 as may palpating 
during expiration. 71 Simultaneous auscultation allows quan¬ 
tification of the duration of the impulse, with a sustained 
impulse defined as lasting more than two-thirds of systole. 68 

An abnormal heart size, as measured by precordial percus¬ 
sion, may be more sensitive than an abnormal apical impulse 
in detecting abnormal ventricular size. 53 However, the inde¬ 
pendent contribution of this finding to the overall clinical 
examination, especially compared with radiographic cardio- 
megaly, has not been assessed. 

Third Heart Sound 

The left ventricular third heart sound occurs during ventric¬ 
ular vibration with rapid diastolic filling. This vibration may 
occur when either filling pressure is increased or ventricular 
compliance is reduced. The third heart sound is low pitched 
and may be faint or intermittent. It should be sought with 
the bell of the stethoscope over the apical impulse. Listening 
while the patient is in the 45-degree left lateral decubitus 
position doubles the yield. 72 


The third heart sound may be confused with other dia¬ 
stolic sounds such as an opening snap, an abnormally split 
second heart sound, or even a fourth heart sound if the 
patient is tachycardic. The third heart sound is, with rare 
exception, the only middiastolic sound. It occurs approxi¬ 
mately 150 ms after the second heart sound, or 5 times 
longer than the normal split of the second heart sound. 

Radiographic Cardiomegaly 

We previously reviewed the role of the chest radiograph in 
assessing left ventricular dysfunction. 17 The 2 most helpful 
findings are cardiac size and pulmonary vessels. 

Radiographic cardiomegaly best correlates with total left ven¬ 
tricular size 73,74 and can be caused by an enlarged left ventricular 
cavity or a hypertrophied ventricular wall. As decreased EF cor¬ 
relates with an enlarged left ventricle cavity, cardiomegaly is 
observed in persons with decreased EF. Because an increased 
filling pressure is due to a decreased EF in 52% to 72% 40 - 44 of 
patients with heart failure, cardiomegaly is also associated with 
an increased filling pressure. 12 Finally, because cardiomegaly 
can be caused by a hypertrophied ventricular wall, it may also 
be observed with diastolic dysfunction. 41,46 

Cardiomegaly is most easily defined as an increased car- 
diothoracic ratio, usually more than 50%. The cardiothoracic 
ratio is the cardiac width divided by largest width of the tho¬ 
racic cavity above the diaphragms. 75 False-positive interpre¬ 
tations of cardiomegaly may occur from an apical fat pad, 76 a 
transversely positioned heart, 77 a decrease in thoracic width, 77 
or radiographs taken anteroposteriorly (supine) or during a 
poor inspiration. 

Radiographic Redistribution 

Redistribution, also called cephalization, flow shift, or pul¬ 
monary venous hypertension, best correlates with left ven¬ 
tricular filling pressure. Increased filling pressure is usually 
caused by systolic dysfunction. Thus, redistribution corre¬ 
lates with systolic dysfunction. 20,29,33,35,37,38 

Less precision exists in the assessment of redistribution 
than cardiomegaly 78,79 or signs of pulmonary interstitial 
edema. 14,79,80 The easiest and best-studied definition of redis¬ 
tribution is simply upper lobe vessels larger than lower lobe 
vessels. 81 Comparisons should be made at equal distances 
above and below the hilum. As when assessing cardiomegaly, 
supine or expiratory radiographs can cause false-positive 
interpretations. 82,83 

How to Improve One’s Skills 

Several good audiotapes are available to assist in learning car¬ 
diac sounds. 84 ' 87 Assessment of the third heart sound and 
duration of the apical impulse can be assisted with visual 
feedback. A tongue blade may be pressed over the apical 
impulse with the examiner’s fingernail or stethoscopic dia¬ 
phragm. Alternatively, a cotton applicator can be wedged in 
the hole of a pediatric precordial suction electrode. 88 Both 
methods can visually demonstrate a sustained apical impulse 
or third heart sound. 


CHAPTER 16 Congestive Heart Failure 


THE BOTTOM LINE 

1. Cardiac findings can be subtle and hard to elicit. Clinician 
experience contributes to the ability to detect findings. 

2. Detecting an increased filling pressure suggests the need 
for diuretics. Detecting increased filling pressure when the 
EF is normal may signal the presence of diastolic dysfunc¬ 
tion. For the detection of increased filling pressure: 

• Very helpful findings are radiographic redistribution 
and jugular venous distention. 

• Somewhat helpful findings are dyspnea, orthopnea, 
tachycardia, decreased systolic or pulse pressure, third 
heart sound, rales, and abdominojugular reflux. 

• Edema is helpful only when present. 

• Few studies address how to rule out or diagnose 
increased filling pressure, but the presence of systolic 
dysfunction affects the assessment. The clinician can 
probably exclude the diagnosis in patients without 
known systolic dysfunction when no more than 1 End¬ 
ing of increased filling pressure is present. In such 
patients, at least 3 abnormal Endings suggest increased 
filling pressure. In patients with known severe systolic 
dysfunction, the absence of any abnormal findings of 
increased filling pressure probably rules out increased 
filling pressure, and the presence of at least 1 very help¬ 
ful finding (radiographic redistribution or jugular 
venous distention) suggests increased filling pressure. It 
is unclear whether these results apply in the intensive 
care setting. 

3. Detecting a decreased EF indicates the need for specific 
medical therapy and may influence the decision for coro¬ 
nary revascularization. For the detection of a decreased EF: 

• Very helpful findings are chest radiograph (especially 
cardiomegaly, but also redistribution), anterior Q waves 
or left bundle-branch block on ECG, and abnormal api¬ 
cal impulse (especially if sustained). 

• Somewhat helpful findings are tachycardia, decreased 
blood pressure or pulse pressure, third heart sound, 
rales, dyspnea, previous infarction other than anterior, 
and high peak creatine phosphokinase level (in the 
postinfarction patient). 

• Two signs of increased filling pressure, edema and 
increased jugular venous pressure, are helpful only 
when present. This is probably true only when diastolic 
dysfunction is unlikely as a cause of increased filling 
pressure. 

• The clinician can usually rule out the detection of a 
decreased EF when no abnormal findings, including no 
sign of increased filling pressure, are present (LR-, 0.1). At 
least 3, and frequently more, abnormal findings are needed 
to confirm the diagnosis (FR+, 14). 

4. Among patients with increased filling pressure, distinguish¬ 
ing diastolic from systolic dysfunction determines further 
evaluation and treatment. In making the distinction: 

• The very helpful finding is elevated blood pressure dur¬ 
ing the episode of increased filling pressure. 


• Somewhat helpful findings are obesity, lack of tachycar¬ 
dia, older age, and absence of smoking or coronary 
artery disease. 

• Normal radiographic heart size is helpful only when 
present. 

• Few studies address the distinction of systolic from dia¬ 
stolic dysfunction. Currently, the EF needs objective 
measurement in patients with increased filling pressure. 
In patients who appear to have increased filling pressure 
with a normal EF, the clinician should also consider cor 
pulmonale, valvular cardiac disease, pulmonary fibrosis, 
intermittent ischemia, and iatrogenic volume overload. 

Since this manuscript was accepted for publication, an 
additional study 89 has been published that found that the 
highest combination of sensitivity and specificity in the 
detection of systolic dysfunction occurred when physical 
examination, ECG, and chest radiograph were combined. 

Author Affiliations at the Time of the Original Publication 

Department of Medicine, University of Texas Health Science 
Center at San Antonio and the Department of Medicine, 
Audie F. Murphy Memorial Veterans Hospital, San Antonio, 
(Drs Badgett and Mulrow); and Washington Hospital Center, 
Washington, DC (Dr Fucey). 

Acknowledgments 

The authors acknowledge and appreciate the statistical pool¬ 
ing of data in Table 16-3 with Meta-Test software by Joseph 
Lau, MD, the statistical help by Gil Ramirez, DrPH, and care¬ 
ful reviews of the manuscript by Matthew Gillman, MD, SM, 
and John Williams, MD, MHS. 

This study was supported in part by the San Antonio 
Cochrane Center, Audie L. Murphy Memorial Veterans 
Affairs Hospital. Dr Mulrow is a Veterans Affairs Senior 
Research Associate. 

REFERENCES 

1. The SOLVD Investigators. Effect of enalapril on mortality and the devel¬ 
opment of heart failure in asymptomatic patients with reduced left ven¬ 
tricular ejection fractions. NEngl JMed. 1992;327(10):685-691. 

2. The role of revascularization. In: Konstam M, Dracup K, Baker D, et al, 
eds. Heart Failure: Evaluation and Care of Patients With Left-Ventricular 
Systolic Dysfunction. Rockville, MD: Agency for Health Care Policy and 
Research, Public Health Service, US Dept of Health and Human Ser¬ 
vices; 1994:67-77. Clinical Practice Guideline 11. 

3. Bonow RO, Udelson JE. Left ventricular diastolic dysfunction as a cause 
of congestive heart failure. Ann Intern Med. 1992;117(6):502-510. 

4. Gaasch WH. Diagnosis and treatment of heart failure based on left ven¬ 
tricular systolic or diastolic dysfunction. JAMA. 1994;271(16):1276- 
1280. 

5. Holleman DR Jr, Simel DL. Does the clinical examination predict airflow 
limitation? JAMA. 1995;273(4):313-319. 

6. Mulrow CD, Lucey CR, Farnett LE. Discriminating causes of dyspnea 
through the clinical examination. / Gen Intern Med. 1993;8(7):383-392. 

7. Hutcheon D, Nemeth E, Quinlan D. The role of furosemide alone and in 
combination with digoxin in the relief of symptoms of congestive heart 
failure./ Clin Pharmacol. 1980;20(l):59-68. 

8. Guyatt GH. The treatment of heart failure. Drugs. 1986;32(6):538-568. 

9. Braunwald E. Pathophysiology of heart failure. In: Braunwald E, ed. 
Heart Disease. Philadelphia, PA: WB Saunders Co; 1992:393. 

10. Eichna LW. Circulatory congestion and heart failure. Circulation. 1960; 
22:864-886. 



CHAPTER 16 The Rational Clinical Examination 


11. Braunwald E, Ross J. The ventricular end-diastolic pressure. Am J Med. 
1963;34:147-150. 

12. Harlan R, Oberman A, Grimm R, Rosati RA. Chronic congestive heart 
failure in coronary artery disease. Ann Intern Med. 1977;86(2): 132-138. 

13. Chakko S, Woska D, Martinez H, et al. Clinical, radiographic, and hemo¬ 
dynamic correlations in chronic congestive heart failure. Am J Med. 
1991;90(3):353-359. 

14. Butman SM, Ewy GA, Standen JR, Kern KB, Hahn E. Bedside cardiovas¬ 
cular examination in patients with severe chronic heart failure. / Am Coll 
Cardiol. 1993;22(4):968-974. 

15. Stevenson LW, Perloff JK. The limited reliability of physical signs for esti¬ 
mating hemodynamics in chronic heart failure. JAMA. 1989;261(6):884- 
888 . 

16. Pharmacologic management. In: Konstam M, Dracup K, Baker D, et al, 
eds. Heart Failure: Evaluation and Care of Patients With Left-Ventricular 
Systolic Dysfunction. Rockville, MD: Agency for Health Care Policy and 
Research, Public Health Service, US Dept of Health and Human Ser¬ 
vices; 1994:49-66. Clinical Practice Guideline 11. 

17. Badgett RG, Mulrow CD, Ramirez G, Otto PM. How well can the chest 
radiograph diagnose left ventricular dysfunction? / Gen Intern Med. 
1996; 11 (10):625-634. 

18. Carlson KJ, Lee DC-S, Goroll AH, Leahy M, Johnson RA. An analysis of 
physicians’ reasons for prescribing long-term digitalis therapy in outpa¬ 
tients. / Chronic Dis. 1985;38(9):733-739. 

19. Forrester JS, Diamond GA, Swan HJC. Correlative classification of clini¬ 
cal and hemodynamic function after myocardial infarction. Am J Car¬ 
diol. 1977;39(2): 137-145. 

20. Forrester JS, Diamond G, Chatterjee K, Swan HJC. Medical therapy of 
acute myocardial infarction by application of hemodynamic subsets. N 
Engl} Med. 1976;295(24): 1356-1362. 

21. Fein AM, Goldberg SK, Walkenstein MD, Dershaw B, Braitman L, Lipp- 
man ML. Is pulmonary artery catheterization necessary for the diagnosis 
of pulmonary edema? Am Rev Respir Dis. 1984;129(6):1006-1009. 

22. Tuchschmidt J, Sharma OP. Impact of hemodynamic monitoring in a 
medical intensive care unit. Crit Care Med. 1987;15(9):840-843. 

23. Eisenberg PR, Jaffe AS, Schuster DP. Clinical evaluation compared to 
pulmonary artery catheterization in the hemodynamic assessment of 
critically ill patients. Crit Care Med. 1984;12(7):549-553. 

24. Connors AF, McMaffree DR, Gray BA. Evaluation of right-heart cathe¬ 
terization in the critically ill patient without acute myocardial infarction. 
N Engl J Med. 1983;308(5):263-267. 

25. Connors AF, Dawson NV, Shaw PK, Montenegro HD, Nara AR, Martin 
L. Hemodynamic status in critically ill patients with and without acute 
heart disease. Chest. 1990;98(5):1200-1206. 

26. Steingrub JS, Celoria G, Vickers-Lahti M, Teres D, Bria W. Therapeutic 
impact of pulmonary artery catheterization in a medical/surgical ICU. 
Chest. 1991;99(6): 1451-1455. 

27. Celoria G, Steingrub JS, Vickers-Lahti M, et al. Clinical assessment of 
hemodynamic values in two surgical intensive care units. Arch Surg. 
1990;125(8): 1036-1039. 

28. Rihal CS, Davis KB, Kennedy JW, Gersh BJ. The utility of clinical, elec¬ 
trocardiographic, and roentgenographic variables in the prediction of 
left ventricular function. Am J Cardiol. 1995;75(4):220-223. 

29. Eagle KA, Quertermous T, Singer DE, et al. Left ventricular ejection frac¬ 
tion: physician estimates compared with gated blood pool scan measure¬ 
ments. Arch Intern Med. 1988;148(4):882-885. 

30. Mattleman SJ, Hakki A-H, Iskandrian AS, Segal BL, Kane SA. Reliability 
of bedside evaluation in determining left ventricular function. J Am Coll 
Cardiol. 1983;1(2 pt l):417-420. 

31. Ostojic MC, Young JB, Hess KR. Prediction of left ventricular ejection 
fraction using a unique method of chest x-ray and ECG analysis. Am 
Heart J. 1989;117(3):590-598. 

32. Cease KB, Nicklas JM. Prediction of left ventricular ejection fraction 
using simple quantitative clinical information. Am J Med. 1986;81(3): 
429-436. 

33. Gadsboll N, Hoilund-Carlsen PF, Nielsen GG, Bernig J, Bruun NE, Hein 
E. Interobserver agreement and accuracy of bedside estimation of right 
and left ventricular ejection fraction in acute myocardial infarction. Am J 
Cardiol. 1989;63(18):1301-1307. 

34. Gadsboll N, Hoilund-Carlsen PF, Nielsen GG, et al. Symptoms and signs 
of heart failure in patients with myocardial infarction. Eur Heart J. 
1989; 10(11): 1017-1028. 


35. Jain AP, Gupta OP, Kumar A, Trivedi SK. Evaluation of clinical variables 
as predictors of left ventricular function in acute myocardial infarction. / 
Assoc Physicians India. 1993;41 (1): 17-19. 

36. Mangschau A, Jonsbu J, Karlson RL. Congestive heart failure and ejec¬ 
tion fraction in acute myocardial infarction. Acta Med Scand. 1986; 
220(2): 101-107. 

37. McNamara RF, Carleen E, Moss AJ, and the Multicenter Postinfarction 
Research Group. Estimating left ventricular ejection fraction after myo¬ 
cardial infarction by various clinical parameters. Am J Cardiol. 1988; 
62(4): 192-196. 

38. Sanford CF, Corbett J, Nicod P, et al. Value of radionuclide ventriculog¬ 
raphy in the immediate characterization of patients with acute myocar¬ 
dial infarction. Am J Cardiol. 1982;49(4):637-644. 

39. Silver MT, Rose GA, Paul SD, O’Donnell CJ, O’Gara PT, Eagle KA. A 
clinical rule to predict preserved left ventricular ejection fraction in 
patients after myocardial infarction. Ann Intern Med. 1994; 121 (10):750- 
756. 

40. Ghali JK, Kadakia S, Cooper RS, Liao Y. Bedside diagnosis of preserved 
versus impaired systolic function in heart failure. Am ] Cardiol. 1991; 
67(11): 1002-1006. 

41. McDermott MM, Feinglass J, Sy J, Gheorghiade M. Hospitalized conges¬ 
tive heart failure patients with preserved versus abnormal left ventricular 
systolic function. Am J Med. 1995;99(6):629-635. 

42. Aguirre FV, Pearson AC, Lewen MK, McCluskey M, Labovitz AJ. Useful¬ 
ness of Doppler echocardiography in the diagnosis of congestive heart 
failure. Am J Cardiol. 1989;63( 15): 1098-1102. 

43. Aronow WS, Ahn C, Kronzon I. Prognosis of congestive heart failure in 
elderly patients with normal versus abnormal left ventricular systolic 
function associated with coronary artery disease. Am J Cardiol. 1990; 
63:1526-1528. 

44. Bier AJ, Eichacker PQ, Sinoway LI, Terribile SM, Strom JA, Keefe DL. 
Acute cardiogenic pulmonary edema. Angiology. 1988;39(3 pt 1):211- 
218. 

45. Cocchi A, Zuccala G, Del Sindaco D, et al. Cross-sectional echocardiog¬ 
raphy: a window on congestive heart failure in the elderly. Aging. 1991;3 
(3):257-262. 

46. Cohn JN, Johnson G, and the Veterans Administration Cooperative 
Study Group. Heart failure with normal ejection fraction: the V-HeFT 
Study. Circulation. 1990;81(2 suppl):48-53. 

47. Dougherty AH, Naccarelli GV, Gray EL, Hicks CH, Goldstein RA. Con¬ 
gestive heart failure with normal systolic function. Am J Cardiol. 
1984;54(7):778-782. 

48. Echeverria HH, Bilsker MS, Myerburg RJ, Kessler KM. Congestive heart 
failure: echocardiographic insights. Am J Med. 1983;75(5):750-755. 

49. Takarada A, Kurogane H, Minamiji K, et al. Congestive heart failure in 
the elderly: echocardiographic insights. Jpn Circ J. 1992;56(6):527-534. 

50. Wong WF, Gold S, Fukuyama O, Blanchette PL. Diastolic dysfunction in 
elderly patients with congestive heart failure. Am J Cardiol. 1989;63(20): 
1526-1528. 

51. Zema MJ, Restivo B, Sos T, Sniderman KW, Kline S. Left ventricular dys¬ 
function: bedside Valsalva manoeuvre. Br Heart}. 1980;44(5):560-569. 

52. Zema MJ, Masters AP, Margouleff D. Dyspnea: the heart of the lungs? 
Chest. 1984;85(l):59-64. 

53. Heckerling PS, Weiner SL, Wolfkiel CJ, et al. Accuracy and reproducibility 
of precordial percussion and palpation for detecting increased left ventric¬ 
ular end-diastolic volume and mass. JAMA. 1993;270( 16): 1943-1948. 

54. Raudenbush SW. Random effects models. In: Cooper H, Hedges LV, eds. 
The Handbook of Research Data Synthesis. New York, NY: Russell Sage 
Foundation; 1994:301-316. 

55. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology. 
2nd ed. Boston, MA: Little Brown & Co; 1991:120. 

56. Mustchin CP, Tiwari I. Diagnosing the breathless patient. Lancet. 1982; 
l(8277):907-908. 

57. Schmitt BP, Kushner MS, Weiner SL. The diagnostic usefulness of the his¬ 
tory of the patient with dyspnea. / Gen Intern Med. 1986;l(6):386-393. 

58. Fedullo AL, Swinborne AJ, McGuire-Dunn C. Complaints of breathless¬ 
ness in the emergency department. N YState }Med. 1986;86(l):4-6. 

59. Pauker SG, Kassirer JP. The threshold approach to clinical decision mak¬ 
ing. N Engl J Med. 1980;302(20):1109-1117. 

60. Simel DL, Feussner JR, Delong ER, Matchar DB. Intermediate, indeter¬ 
minate, and uninterpretable diagnostic tests results. Med Decis Making. 
1987;7(2):107-114. 


CHAPTER 16 Congestive Heart Failure 


61. Konstam M, Dracup K, Baker D, et al. Heart Failure: Evaluation and Care 
of Patients With Left-Ventricular Systolic Dysfunction. Rockville, MD: 
Agency for Health Care Policy and Research, Public Health Service, US 
Dept of Health and Human Services; 1994:2. Clinical Practice Guideline 
11. 

62. Westman EC, Matchar DB, Samsa GP, Mulrow CD, Waugh RA, Feussner 
JR. Accuracy and reliability of apical S3 gallop detection. / Gen Intern 
Med. 1995; 10(8):455-457. 

63. Mangione S, Nieman LZ, Gracely E, Kaye D. The teaching and practice 
of cardiac auscultation during internal medicine and cardiology train¬ 
ing. Ann Intern Med. 1993;119(l):47-54. 

64. St Clair EW, Oddone EZ, Waugh RA, Corey R, Feussner JR. Assessing 
housestaff diagnostic skills using a cardiology patient simulator. Ann 
Intern Med. 1992;117(9):751-756. 

65. Landis JR, Koch GG. The measurement of observer agreement for cate¬ 
gorical data. Biometrics. 1977;33( 1): 159-174. 

66. Reeves RA. Does this patient have hypertension? JAMA. 1995;273 
(15):1211-1218. 

67. Cook DJ, Simel DL. Does this patient have abnormal central venous 
pressure? JAMA. 1996;275(8):630-634. 

68. Conn RD, Cole JS. The cardiac apex impulse. Ann Intern Med. 
1971;75(2): 185-191. 

69. Eilen SD, Crawford MH, O’Rourke RA. Accuracy of precordial palpation 
for detecting increased left ventricular volume. Ann Intern Med. 
1983;99(5):628-630. 

70. Badgett RG, Tanaka DJ, Hunt DK, et al. Can moderate chronic obstruc¬ 
tive pulmonary disease be diagnosed by historical and physical findings 
alone? Am J Med. 1993;94(2): 188-196. 

71. Bates B, Bickley LS, Hoekelman RA. A Guide to Physical Examination 
and History Taking. 6th ed. Philadelphia, PA: JB Lippincott; 1995:286. 

72. Bethel HJN, Nixon PGF. Examination of the heart in supine and left lat¬ 
eral positions. Br Heart J. 1973;35:902-907. 

73. Glover L, Baxley WA. A quantitative evaluation of heart size measure¬ 
ments from chest roentgenograms. Circulation. 1973;47(6): 1289-1296. 

74. Chikos PM, Figley MM, Fisher L. Correlation between chest films and 
angiographic assessment of left ventricular size. AJR Am J Roentgenol. 
1977; 128(3):367-373. 

75. Danzer CS. The cardiothoracic ratio. Am J Med Sci. 1919;157:513-521. 


76. Feild BJ, Russell RO, Moraski RE, et al. Left ventricular size and function 
and heart size in the year following myocardial infarction. Circulation. 
1974;50(2):331-339. 

77. Comeau WJ, White PD. A critical analysis of standard methods of esti¬ 
mating heart size from roentgen measurements. Am J Roentgenol Rad 
Ther. 1942;47:665-677. 

78. Milne ENC, Pistolesi ENC, Miniati M, Giuntini C. The radiologic dis¬ 
tinction of cardiogenic and noncardiogenic edema. AJR Am J Roentgenol. 
1985;144(5):879-894. 

79. Miniati M, Pistolesi M, Paoletti P, et al. Objective radiographic criteria to 
differentiate cardiac, renal, and injury lung edema. Invest Radiol. 
1988;23(6):433-440. 

80. Norgaard TJ, Gjorup T, Brems-Dalgaard E, Hartelius H, Grun B. Inter¬ 
observer variation in the detection of pulmonary venous hypertension 
in chest radiographs. Eur J Radiol. 1990;11(3):203-206. 

81. Simon M. The pulmonary vessels in incipient left ventricular decompen¬ 
sation. Circulation. 1961;24:185-190. 

82. Burko H, Carwell G, Newman E. Size, location, and gravitational 
changes of normal upper lobe pulmonary veins. AJR Am J Roentgenol. 
1971;lll(4):687-689. 

83. Steiner RM, Levin DC. Radiology of the heart. In: Braunwald E, ed. 
Heart Disease. Philadelphia, PA: WB Saunders Co; 1992:213. 

84. Perloff JK, Silverman ME. Physical Examination of the Heart and Circula¬ 
tion [cassette tape]. Bethesda, MD: American College of Cardiology 
Extended Learning; 1984. 

85. Tavel ME. Heart Sounds and Murmurs [cassette tape] Chicago, IL: Year 
Book Medical Publishers Inc; 1973. 

86. Tilkian AG, Conover MB. Understanding Heart Sounds and Murmurs 
[cassette tape/book]. 2nd ed. Philadelphia, PA: WB Saunders Co; 1984. 

87. Harvey WP, Canfield DC. Clinical Auscultation of the Cardiovascular Sys¬ 
tem [cassette tape]. Newton, NJ: Laennec Publishing Co; 1989. 

88. Willis PW. Inspection and palpation of the precordium. In: Hurst JW, 
Schlant RC, Rackley CE, Sommenblick EH, Wenger NK, eds. The Heart, 
Arteries, and Veins. New York, NY: McGraw-Hill Book Co; 1990:165. 

89. Gillespie ND, McNeil G, Pringle T, Ogston S, Struthers AD, Gillespie SD. 
Cross sectional study of contribution of clinical assessment and simple 
cardiac investigations in patients admitted with acute dyspnea. BMJ. 
1997;314(7085):936-940. 


This page intentionally left blank 



CHAPTER 


CLINICAL SCENARIOS 


Does This Dyspneic 
Patient in the Emergency 
Department Have 

Congestive Heart 
Failure? 

Charlie S. Wang, MD 
J. Mark FitzGerald, MB, DM 
Michael Schulzer, MD, PhD 
Edwin Mak, BASc 
NajibT. Ayas, MD, MPH 


CASE 1 A 70-year-old woman with a history of a myo¬ 
cardial infarction and heart failure presents to the emer¬ 
gency department (ED) with a 2-day history of dyspnea at 
rest, orthopnea, and paroxysmal nocturnal dyspnea. Phys¬ 
ical examination reveals an elevated jugular venous pres¬ 
sure, a third heart sound (ventricular filling gallop), 
bibasilar rales and wheezing, and bilateral lower extremity 
edema. The chest radiograph reveals cardiomegaly. An 
electrocardiogram (ECG) shows atrial fibrillation. 

CASE 2 A 65-year-old previously healthy man with a 30 
pack-year smoking history presents to the ED with a 3- 
week history of dyspnea on exertion and at rest, associated 
with productive cough and sputum. Physical examination 
reveals bilateral rales and wheezing. The chest radiograph 
reveals pulmonary venous congestion and a pattern of 
interstitial edema. An ECG shows lateral ST-segment 
depression. 

CASE 3 A 60-year-old man with a history of chronic 
obstructive pulmonary disease (COPD) and myocardial 
infarction presents to the ED with a 2-week history of 
worsening dyspnea on exertion and cough. Physical 
examination reveals an elevated jugular venous pressure, 
bilateral wheezing, and bilateral lower extremity edema. 
The chest radiograph shows normal results. An ECG 
shows Q waves inferiorly. 


WHY IS THIS QUESTION IMPORTANT? 


Eleart failure is a major public health concern. A heart failure 
epidemic affects more than 15 million people in North 
America and Europe, and an additional 1.5 million new cases 
are diagnosed every year. 1 ' 5 It is the most costly cardiovascu¬ 
lar disorder in western countries, accounting for an esti¬ 
mated total direct annual expenditure of more than $24 
billion in the United States in 2001. 6,7 Failure to diagnose 
heart failure increases mortality, delays hospital discharge, 
and increases treatment costs. 8,9 

Dyspnea, an uncomfortable sensation of breathing 10 or an 
awareness of respiratory distress, 11 is the cause for more than 
2.5 million clinician visits per year in the United States. 12 A 
number of disorders cause dyspnea, including congestive 
heart failure, COPD, asthma, deconditioning, metabolic aci¬ 
dosis, anxiety, upper airway obstruction, and neuromuscular 
weakness. Identifying patients with heart failure among the 
other causes allows early institution of appropriate sympto¬ 
matic and evidence-based therapies. 

It is not always possible (or feasible) to promptly evaluate 
every patient with dyspnea with tests of cardiac function 
(echocardiography, nuclear scans, or cardiac catheterization). 
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This challenges physicians who must identify heart failure 
according to medical history, physical examination, and rap¬ 
idly available investigations (eg, chest radiograph, ECG, 
serum brain natriuretic peptide [BNP]). Therefore, the pur¬ 
pose of this review was to identify the most useful symptoms, 
signs, and tests in diagnosing the clinical syndrome of heart 
failure in dyspneic patients presenting to the ED. By the syn¬ 
drome of heart failure, we mean an overall clinical diagnosis 
of heart failure as the cause of dyspnea (irrespective of etiol¬ 
ogy or systolic or diastolic dysfunction), using information 
from many sources, including medical history, physical 
examination, chest radiograph, ECG, serum chemistries, and 
1 or more confirmatory tests of cardiac function. 

Pathophysiology of Dyspnea in Heart Failure 

Multiple pathophysiologic mechanisms have been hypothe¬ 
sized to modulate the sensation of dyspnea in patients with 
symptomatic heart failure (Table 16-4). 

A previous Rational Clinical Examination article assessed 
the usefulness of the clinical examination in predicting 
decreased left ventricular ejection fraction (EF) or increased 
filling pressure. 17 Our current review extends the previous 
report by focusing on the prediction of the clinical syndrome 
of heart failure in dyspneic patients. This clinical focus is use¬ 
ful because not every patient with left ventricular dysfunc¬ 
tion or high filling pressures on objective cardiac testing will 
be subjectively dyspneic; furthermore, patients with a 
reduced EF may be dyspneic from causes other than heart 
failure. 18 ' 20 Therefore, the use of the syndrome of heart failure 


Table 16-4 Physiological Categories and Mechanisms Causing 
Dyspnea in Heart Failure 3 

Category 

Mechanisms 11,10,13 - 16 

Increased respi¬ 
ratory drive 

Increased left LVEDP —> pulmonary venous congestion 
stimulation of pulmonary J receptors (transmitted by vagal 
afferents to brain) 


Pulmonary venous congestion -> ventilation/perfusion mis¬ 
match, shunt 5 —> hypoxemia —> stimulation of central and 
peripheral chemoreceptors 

Increased work 
of breathing 

Pulmonary venous congestion —> reduced lung compli¬ 
ance increased airways resistance increased 
elastic and resistive work of breathing —> mismatch 
between afferent information from upper airway, lower 
airway, chest wall mechanoreceptors, and efferent sig¬ 
nals to respiratory muscles 

Weakness of 
respiratory 
pump muscles 

Activation of catabolic factors -> myopathy (structural, bio¬ 
chemical, functional abnormalities of skeletal respiratory 
muscles) —> reduced respiratory muscle efficiency and 
endurance -> mismatch between afferent mechanorecep¬ 
tors and efferent signals to respiratory muscles 

Psychological 

Anxiety, depression —> altered central perception 


Abbreviation: LVEDP, left ventricular end diastolic pressure. 

“Adapted from Murray, 13 Braunwald, H16 American Thoracic Society Dyspnea consen¬ 
sus statement, 15 and Manning and Schwartzstein. 10 
“Arrows denote one mechanism leading to another. 

“Occurs when blood moves through the lung without coming into contact with oxy¬ 
genated air. 


takes into account a patient’s subjective sensation and find¬ 
ings on routine investigations, in addition to objective car¬ 
diac testing. One previous literature review has reported on 
the use of the clinical examination for discriminating causes 
of dyspnea; however, it was not restricted specifically to the 
syndrome of heart failure, and summary measures of sensi¬ 
tivity, specificity, and likelihood ratios (LRs) were not 
reported. 21 

We included serum BNP testing in this review because 
recent evidence suggests that it is useful in diagnosing heart 
failure. 22 BNP is a neurohormone that is secreted almost 
exclusively from the ventricles in response to pressure and 
volume overload that produces natriuresis, diuresis, and 
smooth muscle relaxation. 23 There is also emerging evidence 
that BNP is useful in prognosticating cardiovascular mortal¬ 
ity in both acute and chronic heart failure. 22 Studies are cur¬ 
rently ongoing regarding the use of serial BNP levels as an 
indicator of treatment response and for titrating therapy. 22 

How to Elicit Symptoms and Signs 

Appropriate history taking and physical examination of the 
cardiopulmonary system have been described in detail in 
previous Rational Clinical Examination articles, 17,24 ' 30 with 
the exception of the Valsalva maneuver. The Valsalva maneu¬ 
ver is performed by inflating and locking a blood pressure 
cuff to 15 mm Hg above the resting supine systolic pressure 
(Korotkoff sounds should not be audible), at which point the 
patient performs a sustained Valsalva (exhalation against a 
closed glottis) for at least 10 seconds. In a normal response, 
systolic blood pressure immediately increases 30 to 40 mm 
Hg above baseline for 1 to 3 seconds (phase 1, appearance of 
Korotkoff sounds). As venous return decreases, systolic blood 
pressure decreases sharply below baseline (phase 2, disap¬ 
pearance of Korotkoff sounds). When the Valsalva is released, 
there is a further decrease of systolic blood pressure below 
baseline (phase 3, continued absence of Korotkoff sounds). 
Between 3 and 15 seconds after release, systolic blood pres¬ 
sure increases 15 mm Hg or more above the baseline level 
(phase 4, reappearance of Korotkoff sounds). 21,3134 Two 
abnormal responses have been described in heart failure. In 
the absent overshoot response, phases 1 to 3 are normal, but 
Korotkoff sounds do not reappear in phase 4. In the square 
wave response, phase 1 is normal, but Korotkoff sounds are 
present in phases 2 and 3, followed by disappearance in phase 

4 21,31,32,34 

METHODS 

Search Strategy 

We conducted a computerized search of MEDLINE from Janu¬ 
ary 1966 to July 2005 concerning the precision and diagnostic 
accuracy of components of the clinical examination and simple 
investigations in diagnosing patients with dyspnea. Our strategy 
was deliberately broad to minimize the possibility of overlook¬ 
ing relevant articles. Multiple searches were performed with the 
first search using a similar strategy developed for The Rational 
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Clinical Examination series. 35 This strategy combined 4 ex¬ 
ploded Medical Subject Headings (physical examination, medi¬ 
cal history taking, professional competence, routine diagnostic 
tests) with 8 keyword categories (“physical exam,” “medical his¬ 
tory taking,” “professional competence,” “sensitivity and speci¬ 
ficity,” “reproducibility of results,” “observer variation,” “decision 
support techniques,” “Bayes theorem”) and 1 textword category 
(“sensitivity” and “specificity”) and intersected with 1 exploded 
Medical Subject Heading (“dyspnea”). The search was limited to 
studies published in English about humans. Further MEDLINE 
searches were conducted combining the following Medical Sub¬ 
ject Headings textword and keyword searches: “brain natriuretic 
peptide,” “natriuretic peptide,” “BNP,” “Valsalva,” “hepatojugu- 
lar,” “abdominojugular,” and “breathlessness.” These were inter¬ 
sected with the exploded medical subject heading “dyspnea” and 
the textword “dyspnoea.” 

The computerized search was supplemented with a man¬ 
ual search of reference lists of retrieved studies, review arti¬ 
cles, and standard physical examination textbooks to identify 
additional articles not captured through the computerized 
search strategy. 

Study Selection 

One author (C.S.W.) screened the titles and abstracts of the 
computerized search to identify all potentially relevant arti¬ 
cles. All retrieved articles were independently reviewed by 2 
authors (C.S.W. and N.T.A.) for eligibility, assessment of 
methodologic quality, and data abstraction. Only studies that 
evaluated the diagnostic accuracy of some element of the 
medical history, physical examination, or readily available 
diagnostic tests in adult patients with undifferentiated dysp¬ 
nea presenting to the ED, regardless of whether the patients 
had known cardiac or pulmonary diseases, were included. 
Data had to be presented so that 2x2 contingency tables 
could be extracted. Because there currently is no widely 
accepted criterion standard for diagnosing heart failure, and 
because the focus of this review was a syndrome of heart fail¬ 
ure, we accepted as a reasonable reference standard a diagno¬ 
sis agreed on by a panel of physicians after evaluating for 
appropriate symptoms and signs of heart failure and an 
appropriate measure of cardiac dysfunction. 5 

We included studies that evaluated common and rapidly 
available tests (chest radiograph, ECG, and serum BNP) 
because clinicians rely on these basic investigations in con¬ 
junction with their medical history and physical examination 
in bedside decision making. 22,36 There are currently multiple 
BNP assays approved by the Food and Drug Administration 
for clinical use. To date, the largest published randomized 
clinical trials have been funded by industry and have 
reported using the BNP assay of a single manufacturer. 

An a priori decision was made to exclude studies that 
investigated other cardiac neurohormones such as A-type 
natriuretic peptide or other forms of BNP (eg, N-terminal 
prohormone BNP). It was thought at the time of this review 
that there would be insufficient published data on these 
other neurohormones to draw significant conclusions. We 
also excluded studies that (1) were review articles with no 


original data; (2) had no clinical examination performed or 
reported; (3) used only echocardiography, computed tomog¬ 
raphy scans, or invasive hemodynamic monitoring as the ref¬ 
erence standard for heart failure without clinical correlation 
because the results from these tests serve as part of the refer¬ 
ence standard for a clinical diagnosis; (4) were population 
based; (5) enrolled patients younger than 18 years; and (6) did 
not specifically include patients reporting dyspnea. We 
resolved disagreements between reviewers on study selection, 
assessment of quality, and abstraction of data by consensus. 

Assessment of Study Quality 

Study quality was assigned according to the grading scheme 
developed by Sackett et al 37 and previously used for this 
series. 24 Level 1 studies were primary prospective studies of 
the accuracy or precision of the clinical examination that 
involved comparisons of clinical findings (symptom or sign) 
with a reference standard of diagnosis among a large number 
(sufficient to have narrow confidence limits on the resulting 
sensitivity, specificity, or LRs) of consecutive or random 
patients with dyspnea. For precision studies, this required 2 
or more independent blinded raters of symptoms or signs in 
a large number of patients. Level 2 studies were similar to 
level 1 but with smaller numbers of patients. Level 3 studies 
were comparisons of clinical findings with a reference stan¬ 
dard of diagnosis among nonconsecutive or nonrandom 
patients with dyspnea. Studies of a retrospective nature were 
included as level 3. Level 4 studies were comparisons of clini¬ 
cal findings with a reference standard of diagnosis among 
convenience samples of patients who obviously have the tar¬ 
get condition. Finally, level 5 studies were comparisons of 
clinical findings with a reference standard of unknown or 
uncertain validity among convenience samples of patients 
and, perhaps, healthy patients. 

Statistical Methods 

Two authors (C.S.W. and N.T.A.) independently extracted data 
for analysis. Published raw data were used to construct 2x2 
contingency tables for each clinical variable. Where data for the 
same variable were available from 2 or more sources, meta- 
analytic techniques were applied to combine results across stud¬ 
ies. When multiple articles from the same group were found, 
the studies were carefully reviewed to ensure no data were ana¬ 
lyzed in duplicate. Summary positive and negative LRs and 
95% confidence intervals (CIs) were calculated using random- 
effects models based on the delta method. 38 We display only the 
CIs of the LRs in the data tables because these values are most 
useful to clinicians and include the sensitivity and specificity in 
the calculation. The choice of random-effects measures 
decreases the risk of CIs that are too optimistically narrow. 

Sensitivity is defined as the proportion of patients with 
heart failure who have a particular finding; specificity is the 
proportion of patients without heart failure who do not have 
the particular finding. The positive LR is the change in the 
odds of having heart failure when a particular finding is 
present, whereas the negative LR is the change in the odds of 
having heart failure when the particular finding is absent. 
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RESULTS 

Search Results 

A total of 815 citations were identified in our literature search. 
Of these, 682 were excluded after review of their titles and 
abstracts, with 133 studies remaining. These studies were 
reviewed in detail and we identified a total of 22 studies that 
evaluated the role of the clinical examination or basic routine 
investigation (chest radiograph, ECG, serum BNP) in patients 


with undifferentiated dyspnea and that also met our inclusion 

criteria. 12 ' 31 ' 32 ' 36 - 39 - 56 

Study Characteristics 

Only studies of sufficient quality (levels 1-3) were considered 
for the quantitative analysis. Of the 22 studies meeting inclu¬ 
sion criteria, 18 were included in the meta-analysis (Table 
16-5), 12 ' 31 ' 36 ' 39 " 48 ' 52 " 56 whereas the remaining 4 studies 32 ’ 49 " 51 were 
level 4 or 5 and were not included in the evidence tables. 


Table 16-5 Summary of Studies in Emergency Department Patients 





Source, y 

Study 

Quality 3 

Study Design 

Study Criteria 

Total 

Men, 

No. (%) 

Mean 
Age, y 

Incidence of 
Heart Failure, % 

Criterion Standard; 
Objective Measure 

Mueller et al, 56 

2005 

1 

Prospective 

Inclusion: ED with dyspnea. Exclu¬ 
sion: acute myocardial infarction, 
trauma 

251 (93) 

73 

55 

Retrospective review by 1 physi¬ 
cian; echocardiography 

Lainchbury et al, 42 
2003 

1 

Prospective 

Inclusion: ED with dyspnea, able to 
give blood within 8 h of arrival. 
Exclusion: n/a 

205 (49) 

70 

34 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy, RVG 

Logeart et al, 43 

2002 

1 

Prospective 

Inclusion: ED with acute severe 
dyspnea. Exclusion: acute myocar¬ 
dial infarction, chest injury, recent 
surgery, therapy instituted >2 h 
before arrival in ED, emergency 
echocardiography not feasible 

163(67) 

67 

71 

Retrospective review by 2 indepen¬ 
dent cardiologists and 1 pulmo¬ 
nologist; echocardiography, CC, 
RVG, PFT 

Knudsen et al, 44 
2004 

2 

Prospective 

Inclusion: ED with dyspnea. Exclu¬ 
sion: chest pain, dyspnea clearly 
not secondary to heart failure 

155(45) 

NA» 

48 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy CC, RVG, PF 

Bayes-Genis et 
al, 45 2004 

2 

Prospective 

Inclusion: ED with dyspnea, aged 40- 
88 y. Exclusion: NYHA classes 1 and II, 
dyspnea secondary to chest trauma 
or cardiac tamponade, acute coronary 
syndromes without dyspnea, severe 
renal insufficiency, liver cirrhosis 

89 (60) 

71 

83 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy, PF 

Villacorta et al, 46 
2002 

2 

Prospective 

Inclusion: ED with dyspnea. Exclusion: 
obvious diagnosis of dyspnea, acute 
coronary syndromes without dyspnea 

70 (47) 

72 

51 

Retrospective review by 1 cardiolo¬ 
gist; echocardiography 

Davis et al, 47 1994 

2 

Prospective 

Inclusion: ED with dyspnea requir¬ 
ing admission. Exclusion: obvious 
cause of dyspnea, severe renal fail¬ 
ure, acute chest pain 

52 (40) 

74 

61 

Retrospective review by committee 
of physicians and a radiologist; 
echocardiography, PF 

Marantz et al, 31 

1990 

2 

Prospective 

Inclusion: ED with dyspnea, aged 
> 40 y, English speaking, able to 
consent, presented during study 
hours. Exclusion: clinically unstable, 
non-English speaking, disoriented 
or unable to cooperate, refusal to 
consent, left against medical advice 

51 (39) 

64 

45 

Retrospective review by 1 physi¬ 
cian; echocardiography 

Alibay et al, 54 2005 

3 

Convenience 

sample 

Inclusion: ED with dyspnea. Exclu¬ 
sion: n/a 

160(48) 

80 

38 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy 

Ray et al, 55 2004 

3 

Convenience 

sample 

Inclusion: ED with dyspnea < 2 wk, 
aged > 65 y, respiratory rate 
> 25/min or Pao 2 < 70 mm Hg or 
Paco 2 >45 mm Hg or Spo 2 < 92%. 
Exclusion: none 

308 (49) 

80 

54 

Retrospective review by 2 indepen¬ 
dent experts; echocardiography, 
high-resolution computed tomo¬ 
graphic scan, PF 


(continued) 
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Table 16-5 Summary of Studies in Emergency Department Patients ( Continued) 




Source, y 

Study 

Quality 3 

Study Design 

Study Criteria 

Total 
Men, 
No. (%) 

Mean 
Age, y 

Incidence of 
Heart Failure, % 

Criterion Standard; 
Objective Measure 

Springfield et al, 12 
2004 

3 

Convenience 

sample 

Inclusion: ED with dyspnea or respi¬ 
ratory rate > 20/min or Pao 2 < 90 
mm Hg on room air. Exclusion: 
pregnancy, aged < 18 y, trauma 
patients, unconscious or unable to 
speak, < 3 ft 11 in or > 7 ft 8 in 
tall, < 66 or > 341 lb 

38 (42) 

67 

32 

Retrospective review by 1 physi¬ 
cian; echocardiography 

Morrison et al, 36 
2002 

3 

Convenience 

sample 

Inclusion: ED with dyspnea. Exclusion: 
dyspnea clearly not secondary to 
heart failure, unstable angina/myo¬ 
cardial infarction without dyspnea 

321 (95) 

NA 

42 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy, CC, RVG, PFT 

Maisel et al, 39 

2002 

3 

Prospective 

Inclusion: ED with dyspnea as promi¬ 
nent symptom. Exclusion: aged 
< 18 y, dyspnea clearly not secondary 
to heart failure, acute myocardial 
infarction, unstable angina without 
dyspnea, renal failure on dialysis or 
creatinine clearance < 0.25 mL/s 

1586 

(56) 

64 

47 

Retrospective review by 2 indepen¬ 
dent cardiologists; echocardiog¬ 
raphy, CC, RVG, PFT 

McCullough et al, 40 
2002 

3 

See Maisel 
et al 39 

Subgroup of Maisel et al 39 with 
information recorded for ED physi¬ 
cian assessment of probability of 
heart failure 

1538 

(56) 

64 

47 

See Maisel et al 39 

Dao et al, 48 2001 

3 

Convenience 

sample 

Inclusion: ED with dyspnea. Exclu¬ 
sion: dyspnea clearly not secon¬ 
dary to heart failure, acute coronary 
syndromes without dyspnea 

250 (94) 

63 

39 

Retrospective review by 2 inde¬ 
pendent cardiologists; echo¬ 
cardiography, CC, RCG, PFT 


Abbreviations: CC, cardiac catheterization; ED, emergency department; n/a, not applicable; NYHA, New York Heart Association (classification of heart disease); PFT, pulmonary 
function test; RVG, radionuclide ventriculography; Spo 2 , peripheral oxygen saturation. 

“Study quality was assigned according to the grading scheme developed by Sackett et al 37 and previously used for this series . 24 See also “Assessment of Study Quality" in the 
“Methods” section for more details. 

6 NA denotes that the mean age was not published in the source article. 


Precision of Clinical Examination and Investigations 

Precision refers to the degree of variation between observ¬ 
ers (interobserver variation) or within observers (intraob¬ 
server variation) for a particular finding. No study has 
specifically addressed the interobserver or intraobserver 
variability in the recording of findings in dyspneic patients 
ultimately diagnosed with the clinical syndrome of heart 
failure. However, analogous work has been done in other 
diagnoses, including pulmonary diseases and acute coro¬ 
nary syndromes, and in comparison with echocardiography, 
nuclear imaging, and cardiac catheterization. 24,25,29,30 ' 57 ' 63 In 
general, there is much variability in the precision of clinical 
findings associated with heart failure, reflecting the poten¬ 
tially subtle nature of findings and variable examination 
skills of the clinician. 

Accuracy of the Clinical Examination 

Thirteen studies examined the accuracy of the clinical exami¬ 
nation for predicting the presence of heart failure in dyspneic 
patients assessed in the ED. The sensitivities, specificities, 
and corresponding positive and negative LRs for the findings 
are shown in Table 16-6. 


Overall Clinical Gestalt 

The overall clinical gestalt of the initial treating ED physician 
was associated with a high LR+ (4.4; 95% Cl, 1.8-10) for a final 
diagnosis of heart failure. When the emergency physician 
assessed the dyspneic patient as unlikely to have heart failure, the 
odds decreased by about half (LR, 0.45; 95% Cl, 0.28-0.73). 

Historical Items 

The most useful historical features in confirming the pres¬ 
ence of heart failure were congestive heart failure (LR, 5.8; 
95% Cl, 4.1-8.0), myocardial infarction (LR, 3.1; 95% Cl, 
2.0-4.9), or coronary artery disease (LR, 1.8; 95% Cl, 1.1- 
2.8). Likewise, patients without a history of heart failure (LR, 
0.45; 95% Cl, 0.38-0.53), coronary artery disease (LR, 0.68; 
95% Cl, 0.48-0.96), or myocardial infarction (LR, 0.69; 95% 
Cl, 0.58-0.82) were less likely to have their dyspnea explained 
by current heart failure. The results of other historical find¬ 
ings in Table 16-6 had LR CIs that included 1. 

Symptoms 

The presence of paroxysmal nocturnal dyspnea (LR, 2.6; 95% 
Cl, 1.5-4.5), orthopnea (LR, 2.2; 95% Cl, 1.2-3.9), or dyspnea 
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Table 16-6 Summary of Diagnostic Accuracy of Findings on History and Physical Examination in Emergency Department Patients 

Pooled Summary LR (95% Cl) a 


Finding 

Sensitivity Specificity 

Positive 

Negative 

Initial clinical judgment 12 ' 31 ’ 40 ' 55 

0.61 

0.86 

4.4 (1.8-10.0) 

0.45 (0.28-0.73) 

History 

Heart failure 36 - 41 ' 43 ' 45 ' 48 ' 53 ' 56 

0.60 

0.90 

5.8 (4.1-8.0) 

0.45 (0.38-0.53) 

Myocardial infarction 414345 ' 48 ' 53 

0.40 

0.87 

3.1 (2.0-4.9) 

0.69 (0.58-0.82) 

Coronary artery disease 36 - 44 ' 53 ' 56 

0.52 

0.70 

1.8 (1.1-2.8) 

0.68 (0.48-0.96) 

Dyslipidemia 45 

0.23 

0.87 

1.7(0.43-6.9) 

0.89 (0.69-1.1) 

Diabetes mellitus 434548 ’ 56 

0.28 

0.83 

1.7(1.0-2.7) 

0.86 (0.73-1.0) 

Hypertension 3641 ' 434548 ' 53 ' 56 

0.60 

0.56 

1.4 (1.1-1.7) 

0.71 (0.55-0.93) 

Smoker 45 

0.62 

0.27 

0.84(0.58-1.2) 

1.4(0.58-3.6) 

Chronic obstructive pulmonary disease 3645 ’ 48 ' 53 

0.34 

0.57 

0.81 (0.60-1.1) 

1.1 (0.95-1.4) 

Symptoms 

Paroxysmal nocturnal dyspnea 364548 ' 53 ' 56 

0.41 

0.84 

2.6(1.5-4.5) 

0.70 (0.54-0.91) 

Orthopnea 3641434548 ' 53 ’ 56 

0.50 

0.77 

2.2 (1.2-3.9) 

0.65 (0.45-0.92) 

Edema 36 ’ 48 ’ 53 

0.51 

0.76 

2.1 (0.92-5.0) 

0.64 (0.39-1.1) 

Dyspnea on exertion 3648 

0.84 

0.34 

1.3 (1.2-1.4) 

0.48 (0.35-0.67) 

Fatigue and weight gain 36 

0.31 

0.70 

1.0 (0.74-1.4) 

0.99 (0.85-1.1) 

Cough 364548 ' 53 ' 56 

0.36 

0.61 

0.93(0.70-1.2) 

1.0(0.87-1.3) 

Physical Examination 

Third heart sound (ventricular filling gallop) 36414345 ' 48 ' 53 ' 56 

0.13 

0.99 

11 (4.9-25) 

0.88 (0.83-0.94) 

Abdominojugular reflux 31 

0.24 

0.96 

6.4 (0.81-51) 

0.79 (0.62-1.0) 

Jugular venous distention 3641434548 ' 53 ' 56 

0.39 

0.92 

5.1 (3.2-7.9) 

0.66 (0.57-0.77) 

Rales 36 ' 41 ' 43 ’ 45 ' 48 - 53,56 

0.60 

0.78 

2.8(1.9-4.1) 

0.51 (0.37-0.70) 

Any murmur 3644 - 48 ' 53 

0.27 

0.90 

2.6(17-4.1) 

0.81 (0.73-0.90) 

Lower extremity edema 41 - 4345 ' 53 ' 56 

0.50 

0.78 

2.3 (1.5-3.7) 

0.64 (0.47-0.87) 

Valsalva maneuver 31 

0.73 

0.65 

2.1 (1.0-4.2) 

0.41 (0.17-1.0) 

Systolic blood pressure < 100 mm Hg 48 

0.06 

0.97 

2.0 (0.60-6.6) 

0.97 (0.91-1.0) 

Fourth heart sound (atrial gallop) 3648 ' 53 

0.05 

0.97 

1.6(0.47-5.5) 

0.98 (0.93-1.0) 

Systolic blood pressure > 50 mm Hg 48 

0.28 

0.73 

1.0 (0.69-1.6) 

0.99 (0.84-1.2) 

Wheezing 36 ' 444548 - 53 

0.22 

0.58 

0.52 (0.38-0.71) 

1.3 (1.1-1.7) 

Ascites 48 

0.01 

0.97 

0.33 (0.04-2.9) 

1.0(0.99-1.1) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

a LRs are not independent of each other and should not be multiplied in series when multiple findings are considered. 


on exertion (LR, 1.3; 95% Cl, 1.2-1.4) increased the likelihood 
of heart failure. Likewise, the absence of dyspnea on exertion 
(LR, 0.48; 95% Cl, 0.35-0.67), orthopnea (LR, 0.65; 95% 
Cl, 0.45-0.92), or paroxysmal nocturnal dyspnea (LR, 0.70; 
95% Cl, 0.54-0.91) decreased the likelihood of heart failure. 
The results of other findings in Table 16-6 had CIs that 
included 1. 

Physical Examination 

The presence of a third heart sound (ventricular filling gal¬ 
lop) increased the likelihood of heart failure the most (LR, 
11; 95% Cl, 4.9-25). The presence of several other findings 
had CIs that excluded 1: jugular venous distention (LR, 5.1; 
95% Cl, 3.2-7.9), pulmonary rales (LR, 2.8; 95% Cl, 1.9-4.1), 
any cardiac murmur (LR, 2.6; 95% Cl, 1.7-4.1), and leg edema 
(LR, 2.3; 95% Cl, 1.5-3.7). The presence of an abnormal 


abdominojugular reflux response (LR, 6.4; 95% Cl, 0.81-51) 
had a high LR, but its evaluation in only 1 study of 51 
patients led to broad CIs. 31 An abnormal response to the Val¬ 
salva maneuver in the same study had an LR of 2.1 but the 
lower limit of the 95% Cl was 1.0. The presence of the other 
findings in Table 16-6 did not appear useful for assessing the 
likelihood of heart failure in dyspneic patients. 

The absence of pulmonary rales (LR, 0.51; 95% Cl, 0.37- 
0.70), leg edema (LR, 0.64; 95% Cl, 0.47-0.87), or jugular 
venous distention (LR, 0.66; 95% Cl, 0.57-0.77) was the most 
useful finding that decreased the likelihood of heart failure. 
Wheezing also decreased the likelihood that a dyspneic patient 
had heart failure (LR, 0.52; 95% Cl, 0.38-0.71). The absence of a 
third heart sound or a murmur decreased the likelihood of heart 
failure but the point estimate of the LR of these findings 
approached 1. The absence of the other findings in Table 16-6 
did not appear useful as the Cl included 1. Diaphoresis as a sign 
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of heart failure was of uncertain validity, having been evaluated 
in only 2 studies that were each of level 4 quality. 49,50 

Accuracy of Chest Radiographs 

Seven studies examined the accuracy of various chest radio¬ 
graph findings in the ED setting (Table 16-7). The presence of 
any of these findings (except for any edema) had high positive 
LRs with CIs exceeding 1 and therefore, increased the likelihood 
of heart failure in dyspneic patients. The presence of pulmonary 
venous congestion (distention of pulmonary veins and redistri¬ 
bution to the apices) (n = 4 studies; summary LR, 12; 95% Cl, 
6.8-21) and cardiomegaly (n = 6 studies; summary LR, 3.3; 95% 
Cl, 2.4-4.7) increased the likelihood of heart failure and has 
undergone more extensive evaluation so that the results may be 
more reliable. The presence of interstitial edema also had a high 
LR (n = 2 studies; summary LR, 12; 95% Cl, 5.2-27). The pres¬ 
ence of pneumonia or hyperinflation decreased the likelihood of 
heart failure but was assessed in only 1 study. 

The most extensively evaluated chest radiograph findings 
(pulmonary venous congestion and cardiomegaly) were also 
the findings that, when absent, had an LR that was apprecia¬ 
bly different from 1. The absence of cardiomegaly was partic¬ 
ularly useful (LR, 0.33; 95% Cl, 0.23-0.48), with narrower 
CIs than the absence of pulmonary venous congestion (LR, 
0.48; 95% Cl, 0.28-0.83). 

Accuracy of ECG 

Seven studies examined the accuracy of various ECG findings 
in the ED setting (Table 16-7). The presence of atrial fibrilla¬ 
tion in a dyspneic patient was the most important (LR, 3.8; 
95% Cl, 1.7-8.8) and evaluated in several studies (n = 5 studies). 


The presence of new T-wave changes (LR, 3.0; 95% Cl, 1.7- 
5.3) or abnormal ECG findings (LR, 2.2; 95% Cl, 1.6-3.1) 
increased the likelihood of heart failure but was evaluated in 
fewer studies. A completely normal ECG result (LR, 0.64; 95% 
Cl, 0.47-0.88) decreased the likelihood of heart failure and was 
the only normal finding that had a negative LR with a clinically 
meaningful difference from 1. 

Accuracy of BNP 

Eleven studies examined the operating characteristics of vari¬ 
ous cutoffs of serum BNP in the ED setting (Table 16-8). 
Eight of these reported pharmaceutical industry sponsorship, 
2 did not disclose funding sources, and only 1 study reported 
no pharmaceutical relationship. 

As the BNP cutoff increased, the positive LR generally 
increased. Thus, the higher the value of BNP, the more sugges¬ 
tive it was of heart failure. However, no BNP threshold indi¬ 
cated the presence of heart failure with certainty. At any BNP 
threshold up to 250 pg/mL, values lower than the threshold 
always made heart failure much less likely in comparison to 
those with values > 250 pg/mL. However, the serial LRs show 
that, overall, as the BNP increases, the likelihood of heart fail¬ 
ure increases (Table 16-8). (LR, 0.06-0.15). 

BNP levels must be interpreted differently for patients with re¬ 
nal insufficiency. According to an analysis of data from the 
Breathing Not Properly Multinational Study, 39,64 no adjustment in 
the 100 pg/mL threshold appears necessary for patients with an 
estimated glomerular filtration rate of 60 to 89 mL/min/1.73 m 2 , 
with an area under the receiver operating characteristic curve of 
0.90 (a measure of overall accuracy). The loss of accuracy with 
worsening renal function can be minimized by using thresholds 
of 225 and 201 pg/mL, respectively, for patients with estimated 


Table 16-7 Summary of Diagnostic Accuracy of Findings on Chest Radiograph and Electrocardiogram in Emergency Department Patients 



Pooled 


Summary LR (95% Cl) a 


Finding 

Sensitivity 

Specificity 

Positive 

Negative 


Chest Radiograph 

Pulmonary venous congestion 36,41,45,485 

0.54 

0.96 

12(6.8-21) 

0.48 (0.28-0.83) 


Interstitial edema 41,53 

0.34 

0.97 

12(5.2-27) 

0.68 (0.54-0.85) 


Alveolar edema 41 

0.06 

0.99 

6.0 (2.2-16) 

0.95 (0.93-0.97) 


Cardiomegaly 36,41,43 45,48 

0.74 

0.78 

3.3 (2.4-4.7) 

0.33 (0.23-0.48) 


Pleural effusion(s) 36,41 

0.26 

0.92 

3.2 (2.4-4.3) 

0.81 (0.77-0.85) 


Any edema 43,44 

0.70 

0.77 

3.1 (0.60-16) 

0.38(0.11-1.3) 


Pneumonia 41 

0.04 

0.92 

0.50 (0.29-0.87) 

1.0(1.0-1.1) 


Hyperinflation 41 

0.03 

0.92 

0.38 (0.20-0.69) 

1.1 (1.0-1.1) 


ECG 

Atrial fibrillation 36,43,44,48,56 

0.26 

0.93 

3.8 (1.7-8.8) 

0.79 (0.65-0.96) 


New T-wave changes 36 

0.24 

0.92 

3.0 (1.7-5.3) 

0.83 (0.74-0.92) 


Any abnormal finding 41,53 

0.50 

0.78 

2.2(1.6-3.1) 

0.64 (0.47-0.88) 


ST elevation 36,48 

0.05 

0.97 

1.8(0.80-4.0) 

0.98(0.94-1.0) 


ST depression 36,48 

0.11 

0.94 

1.7(0.97-2.9) 

0.95(0.90-1.0) 



Abbreviations: Cl, confidence interval; ECG, electrocardiogram; LR, likelihood ratio. 

LRs are not independent of each other and should not be multiplied in series when multiple findings are considered. 
“Pulmonary venous congestion, manifest as distention of pulmonary veins and redistribution to the apices. 
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Table 16-8 Summary Operating Characteristics of Serum Brain Natriuretic Peptide in Emergency Department Patients 



Summary 

Sensitivity 

Specificity 

Summary Serial LR (95% Cl) 


Clinical judgment or BNP > 100 pg/mL 40a 

0.94 

0.70 

3.1 (2.8-3.5) 


BNP alone, pg/mL b 

>25036,43,55 

0.89 

0.81 

4.6 (2.6-8.0) 


>200 36,42 ' 44 ' 46,54 ' 55 

0.92 

0.75 

3.7 (2.6-5.4) 


>-| 5Q39,43,44,48,54-56 

0.89 

0.71 

3.1 (2.1-4.5) 


>-| QQ36,39,42-44,47,48,54-56 

0.93 

0.66 

2.7 (2.0-3.9) 


>8039,43,47,48 

0.96 

0.71 

3.3 (1.8-6.3) 


>50 39 ,44,54 

0.97 

0.44 

1.7 (1.2-2.6) 


<5039,44,54 

0.97 

0.44 

0.06 (0.03-0.12) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; LR, likelihood ratio. 


“Either an initial clinical probability of heart failure of at least 80% or BNP of at least 100 pg/mL was considered a positive result. A negative result was a clinical probability of 
heart failure lower than 80% and BNP lower than 100 pg/mL. 

b The values shown represent the summary values at each specified threshold. For example, if the BNP is 85 pg/mL, then the LR for heart failure is 3.3. However, if the BNP is 
less than 50 pg/mL, the LR is 0.06. 


glomerular filtration rates of 15 to 29 and 30 to 59 mL/min/ 
1.73 m 2 (areas under receiver operating characteristic curves of 
0.86 and 0.81, respectively). The utility of BNP levels in patients 
with advanced renal insufficiency (estimated glomerular filtra¬ 
tion rate < 15 mL/min/1.73 m 2 or receiving dialysis) is unclear as 
these patients were not included in that study. 

Accuracy of Findings in Patients With 
History of Pulmonary Disease 

One study (Table 16-9) examined the accuracy of symptoms, 
signs, ECG, and serum BNP in diagnosing heart failure in 
dyspneic ED patients with a history of asthma or COPD. 52 
This study was a subgroup analysis of the Breathing Not 
Properly Multinational Study. 39 

Initial Clinical Gestalt 

A high initial clinical suspicion by the emergency physician 
(>80% probability) was associated with a high likelihood for a 
final diagnosis of heart failure (LR, 9.9; 95% Cl, 5.3-18), whereas 
an intermediate (21%-79%) or low (<20%) initial clinical suspi¬ 
cion decreased the likelihood of heart failure (LR, 0.65; 95% Cl, 
0.55-0.77) but did not exclude it. In fact, 32% of patients in the 
intermediate suspicion group and 9% of patients in the low clini¬ 
cal suspicion group were ultimately diagnosed with heart failure. 
Assigning a lower probability to the low suspicion group (eg, 
<5%) would likely have reduced misclassification in that study. 

Historical Items 

The presence of most historical findings in Table 16-9 increased 
the likelihood of heart failure with Cls excluding l.A history of 
atrial fibrillation (LR, 4.1; 95% Cl, 2.5-6.6) or coronary bypass 
surgery (LR, 2.8; 95% Cl, 1.3-5.8) was the most useful finding 
that increased the likelihood of heart failure. The absence of rel¬ 
evant historical features did not result in clinically meaningful 
LRs less than 1, other than perhaps the absence of coronary 
artery disease (LR, 0.67; 95% Cl, 0.54-0.84). 


Symptoms 

Only the absence of orthopnea (LR, 0.68; 95% Cl, 0.48-0.95) 
had an LR that was appreciably different from 1. Thus, symp¬ 
toms were not particularly useful among dyspneic patients with 
lung disease in determining who might also have heart failure. 

Physical Examination 

The presence of a third heart sound had a high diagnostic 
value for heart failure (LR, 57; 95% Cl, 7.6-425). Other use¬ 
ful physical examination findings, when present, included 
jugular venous distention (LR, 4.3; 95% Cl, 2.8-6.5), lower 
extremity edema (LR, 2.7; 95% Cl, 2.2-3.5), pulmonary rales 
(LR, 2.6; 95% Cl, 2.1-3.3), or hepatic congestion (LR, 2.4; 
95% Cl, 1.2-4.7). The absence of pulmonary rales (LR, 0.39; 
95% Cl, 0.28-0.55), lower extremity edema (LR, 0.41; 95% 
Cl, 0.30-0.57), or jugular venous distention (LR, 0.65; 95% 
Cl, 0.54-0.78) decreased the likelihood of heart failure. 

Chest Radiograph 

The presence of edema was the most useful radiographic find¬ 
ing for increasing the likelihood of heart failure (LR, 11; 95% 
Cl, 5.8-22). Other very useful findings were cardiomegaly (LR, 
7.1; 95% Cl, 4.5-11) or pleural effusion(s) (LR, 4.6; 95% Cl, 
2.6-8.0). A normal chest radiograph result (LR, 0.11; 95% Cl, 
0.04-0.28), absence of cardiomegaly (LR, 0.54; 95% Cl, 0.44- 
0.67), or absence of edema (LR, 0.68; 95% Cl, 0.58-0.79) 
decreased the likelihood of heart failure. 

Electrocardiogram 

The presence of ECG findings of atrial fibrillation (LR, 6.0; 
95% Cl, 3.4-10), ischemic ST-T-wave changes (LR, 4.6; 95% 
Cl, 2.4-8.7), or Q waves (LR, 3.1; 95% CI, 1.8-5.5) was helpful 
in suggesting a diagnosis of heart failure in the dyspneic ED 
patient with a history of pulmonary disease. No single ECG 
result had clinically useful outcomes for decreasing the likeli¬ 
hood of heart failure. 
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Table 16-9 Diagnostic Accuracy of History, Physical Examination, and Tests of Cardiac Function in Emergency Department Patients With 

History of Asthma or Chronic Obstructive Pulmonary Disease 3 

Finding 

Sensitivity 

Specificity 

Positive LR (95% Cl) b 

Negative LR (95% Cl) b 

Initial clinical judgment 

0.37 

0.96 

9.9(5.3-18) 

0.65 (0.55-0.77) 

History 

Atrial fibrillation 

0.32 

0.92 

4.1 (2.5-6.6) 

0.74 (0.63-0.85) 

Coronary artery bypass grafting 

0.13 

0.95 

2.8 (1.3-5.8) 

0.92 (0.84-0.99) 

Myocardial infarction 

0.25 

0.88 

2.2 (1.4-3.5) 

0.84 (0.74-0.96) 

Diabetes mellitus 

0.26 

0.87 

2.0 (1.3-3.2) 

0.85 (0.74-0.97) 

Coronary artery disease 

0.49 

0.75 

2.0 (1.5-2.6) 

0.67 (0.54-0.84) 

Angina 

0.21 

0.88 

17(1.0-2.8) 

0.90 (0.80-1.0) 

Hypertension 

0.54 

0.55 

1.2 (0.95-1.5) 

0.84(0.65-1.1) 

Symptoms 

Orthopnea 

0.70 

0.44 

1.3 (1.1-1.5) 

0.68 (0.48-0.95) 

Fatigue 

0.74 

0.34 

1.1 (0.96-1.3) 

0.79 (0.54-1.2) 

Nocturnal cough 

0.49 

0.47 

0.93(0.73-1.2) 

1.1 (0.85-1.4) 

Physical Examination 

Third heart sound (ventricular filling gallop) 

0.17 

1.00 

57 (7.6-425) 

0.83 (0.75-0.91) 

Jugular venous distention 

0.41 

0.90 

4.3 (2.8-6.5) 

0.65 (0.54-0.78) 

Lower extremity edema 

0.69 

0.75 

2.7 (2.2-3.5) 

0.41 (0.30-0.57) 

Rales 

0.71 

0.73 

2.6 (2.1-3.3) 

0.39 (0.28-0.55) 

Hepatic congestion 

0.14 

0.94 

2.4 (1.2-4.7) 

0.91 (0.84-1.0) 

Enlarged heart 

0.03 

0.98 

1.6(0.43-6.2) 

0.99(0.95-1.0) 

Wheezing 

0.42 

0.50 

0.85(0.65-1.1) 

1.2(0.94-1.4) 

Chest Radiograph 

Edema 

0.34 

0.97 

11 (5.8-22) 

0.68 (0.58-0.79) 

Cardiomegaly 

0.49 

0.93 

7.1 (4.5-11) 

0.54 (0.44-0.67) 

Pleural effusion(s) 

0.26 

0.94 

4.6 (2.6-8.0) 

0.78 (0.69-0.89) 

Pneumonia 

0.08 

0.92 

1.0 (0.46-2.3) 

1.0(0.93-1.1) 

Hyperinflation 

0.08 

0.85 

0.53(0.25-1.1) 

1.1 (1.0-1.2) 

Normal 

0.05 

0.57 

0.11 (0.04-0.28) 

1.7 (1.5-1.8) 

Electrocardiogram 

Atrial fibrillation 

0.31 

0.95 

6.0 (3.4-10) 

0.73 (0.63-0.84) 

Ischemic ST-T waves 

0.21 

0.95 

4.6 (2.4-87) 

0.83 (0.74-0.93) 

Q waves 

0.22 

0.93 

3.1 (1.8-5.5) 

0.84 (0.75-0.94) 

BNP >100 pg/mL 

0.93 

0.77 

4.1 (3.3-5.0) 

0.09(0.04-0.19) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; LR, likelihood ratio. 

“Adapted from McCullough et al. 52 

“Likelihood ratios are not independent of each other and should not be multiplied in series when multiple findings are considered. 


Brain Natriuretic Peptide 

BNP levels can increase in patients with chronic pulmonary 
diseases because of right ventricular strain. Nevertheless, 
BNP appears to still be useful in these patients. Studies have 
demonstrated that BNP levels are significantly higher in 
patients with a history of chronic lung disease but acute 
dyspnea from heart failure compared with those with a his¬ 
tory of heart failure but acute dyspnea from lung disease. 36,65 

Serum BNP for dyspneic patients with a history of asthma or 
COPD was useful for identifying heart failure (BNP > 100 pg/mL: 
LR, 4.1; 95% Cl, 3.3-5.0). However, it was more powerful for 
excluding heart failure when low (BNP < 100 pg/mL: LR, 0.09; 
95% Cl, 0.04-0.19). However, this was only 1 study, and thus, 


the optimal cutoff for BNP to diagnose or exclude clinical heart 
failure in dyspneic patients with chronic lung diseases is unclear. 

COMMENT 

It is both important and difficult to rapidly differentiate among 
the common causes of dyspnea in ED patients. The syndrome of 
heart failure requires appropriate symptoms, along with objec¬ 
tive measures of cardiac dysfunction. 5 Although sophisticated 
and invasive tests such as Swan-Ganz catheterization can help to 
distinguish between cardiac and pulmonary causes of dyspnea, 
they are frequently unavailable in the acute setting, and thus, the 
diagnosis of heart failure and the decision to institute therapy on 
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an emergency basis rests on the bedside clinical assessment 
(chest radiograph, ECG, and recently, serum BNP). Relying 
purely on echocardiography to diagnose clinical heart failure is 
also problematic because it is often not easily accessible, requires 
specialized training, 66 and may not always truly reflect the cur¬ 
rent cause of dyspnea. 67 That is, not every patient presenting 
with heart failure will have a diminished left ventricular EF; 
patients with diastolic heart failure, for instance, may have ele¬ 
vated filling pressures and dyspnea in the presence of normal EF. 
The reverse is also true in that patients with a decreased left ven¬ 
tricular EF may be dyspneic from noncardiac causes such as 
COPD, and furthermore, the severity of impairment of EF does 
not always correlate with subjective severity of dyspnea. 68 

In this systematic review, many features on clinical examina¬ 
tion, chest radiograph, ECG, and serum BNP were useful in 
diagnosing heart failure in adult ED patients presenting with 
dyspnea in whom heart failure was suspected. Features listed 
in Box 16-1 were assessed in more than 1 study and were useful 
when either present or absent. Other findings may prove use¬ 
ful when evaluated further. 

CLINICIAN’S OVERALL ASSESSMENT 

Our results are consistent with those of Marcus et al. 69 They 
recently studied patients undergoing elective left-sided heart 


Box 16-1 Features Useful in Diagnosing Heart Failure in Adult 
Emergency Department Patients With Dyspnea 

HISTORICAL FEATURES 

• Heart failure 

• Myocardial infarction 

• Coronary artery disease 

SYMPTOMS 

• Paroxysmal nocturnal dyspnea 

• Orthopnea 

• Dyspnea on exertion 

PHYSICAL EXAMINATION 

• Listening for a third heart sound (ventricular filling gallop) 

• Jugular venous pressure assessment 

• Auscultating for rales and wheezing 

• Auscultating for a murmur 

• Assessing the legs for edema 
CHEST RADIOGRAPH 

• Pulmonary venous congestion 

• Interstitial edema 

• Cardiomegaly 

• Pleural effusion(s) 

ELECTROCARDIOGRAM FINDINGS 

• Atrial fibrillation 

• An abnormal result 
BRAIN NATRIURETIC PEPTIDE 

• Most useful when < 100 pg/mL for decreasing the likeli¬ 
hood of heart failure 


catheterization, comparing the test characteristics of third 
and fourth heart sounds with objective measures of left ven¬ 
tricular dysfunction. Although the patient population and 
reference standard for heart failure were different in our 
review compared with theirs (eg, ventricular dysfunction vs a 
clinical diagnosis of heart failure), both studies found that 
third and fourth heart sounds had greater specificity than 
sensitivity and that a third heart sound had a better specificity 
than a fourth heart sound for the diagnosis of heart failure. 

We did not find any studies examining combinations of 
historical and physical examination findings in making a 
diagnosis of heart failure. However, our analysis suggests 
that the initial clinical gestalt of the physician according to 
available information (history, physical examination, chest 
radiograph, ECG) is valuable. Because the overall clinical 
gestalt had LRs that approximate some of the individual 
findings, along with a lack of consistent multivariate mod¬ 
els, we do not know whether all the symptoms and signs are 
independently useful. When clinicians are not confident in 
their clinical gestalt, they should preferentially rely on the 
results of the few findings that have LR estimates most dif¬ 
ferent from 1. 

A high initial clinical suspicion alone (LR, 4.4; 95% Cl, 
1.8-10) (Table 16-6) had a greater positive LR than a com¬ 
posite of high clinical suspicion, BNP level greater than or 
equal to 100 pg/mL, or both, which had a combined positive 
LR of 3.1 (95% Cl, 2.8-3.5) (Table 16-8). This suggests that 
BNP may not contribute much more in patients for whom 
the initial clinical suspicion of heart failure was already high. 
However, in patients for whom the initial clinical suspicion 
of heart failure was not high, BNP at a threshold value of 
100 pg/mL was useful, especially for excluding heart failure 
in this group of patients. To apply these results correctly, it is 
necessary that clinicians first quantify and acknowledge 
their clinical suspicion (eg, formulate a pretest probability). 
If the physician waits until the BNP results are available 
before establishing clinical suspicion, these tests are no 
longer independent and the clinical suspicion becomes 
biased by the BNP. The results of our BNP analysis add sup¬ 
port to recent European guidelines for diagnosing heart fail¬ 
ure, which state that BNP may be a clinically useful test to 
rule out heart failure because of its high negative predictive 
values. 5 Clinicians should be aware that factors other than 
heart failure can affect serum BNP levels (Box 16-2). Algo¬ 
rithms for the use of the BNP test have been proposed 70 but 
not extensively validated. 

Limitations 

The results of our meta-analysis should be interpreted in the 
context of study limitations. One limitation of this review is the 
reference standard for heart failure (adjudication by a panel of 
physicians). Given the subjectivity and potential bias of such a 
standard, many of the studies had disagreement (up to 10%) 
among the adjudicators of whether heart failure was the con¬ 
tributing cause of dyspnea. However, in the absence of a true cri¬ 
terion standard for this clinical syndrome, the reference 
standard, although imperfect, is likely the best available and 
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consistent with the clinical focus of this review. Another limita¬ 
tion that arises from using a clinical reference standard is that 
the final diagnosis of heart failure may not have been made 
independently of the individual findings of interest. That is, the 
panel of physicians may have used some of the clinical findings 
in deciding whether patients ultimately had heart failure as the 
cause of their dyspnea. As such, this may overestimate sensitivi¬ 
ties and specificities. Although this is a valid concern, we believe 
the effects on each finding would be small because the final 
diagnosis relied on a combination of information from many 
diverse sources, including any or all of the following: medical 
history, physical examination, routine laboratory tests, chest 
radiograph, ECG, heart failure scores, objective measures of car¬ 
diac function (eg, echocardiography, radionuclide ventriculog¬ 
raphy, radionuclide angiography, and left ventriculography at 
cardiac catheterization), pulmonary function tests, response to 
treatment, hospitalization course, and follow-up records. 

Our data are derived from studies on patients presenting to 
the ED with dyspnea. Therefore, these results may not generalize 
to inpatients, outpatients in clinic settings who may have more 
chronic dyspnea, or patients without dyspnea. The 18 studies 
included in this meta-analysis represent diverse and heteroge¬ 
neous populations with various comorbidities. The majority of 
the studies excluded patients with acute coronary syndromes 
and in whom an obvious cause of dyspnea (eg, pneumothorax, 
trauma) was present. All the studies of BNP excluded patients in 
whom dyspnea was clearly not secondary to heart failure. There¬ 
fore, the usefulness of BNP from our analysis can be applied 
only to patients in whom the diagnosis of heart failure is a con¬ 
sideration. In patients in whom the suspicion of heart failure is 
low (after taking a careful history and performing the physical 
examination, chest radiograph, and ECG), a BNP level is 
unlikely to affect diagnosis or management (eg, an obvious pul¬ 
monary etiology of dyspnea). 

Other limitations include the inherent subjectivity of clinical 
findings on medical history, physical examination, chest radio¬ 
graph, and ECG. It is impossible to confirm the accuracy of 
individual findings presented in each study, and no formal defi¬ 
nitions were given. For example, we do not have standardized 
information on the technique used for each chest radiograph 
performed (anteroposterior, posteroanterior, portable). 

The Bottom Line 

The features evaluated in more than 1 study with the highest 
LRs (LR > 3.5) for diagnosing heart failure were the follow¬ 
ing: the overall clinical judgment, history of heart failure, a 
third heart sound, jugular venous distention, radiographic 
pulmonary venous congestion or interstitial edema, and elec¬ 
trocardiographic atrial fibrillation. 

The features evaluated in more than 1 study with the lowest 
LRs (LR < 0.60) for diagnosing heart failure were the following: 
the overall clinical judgment, no history of heart failure, no 
dyspnea on exertion, the absence of rales, and the absence of 
radiographic pulmonary venous congestion, or cardiomegaly. 
The single finding that decreased the likelihood of heart failure 
the most was a BNP of less than 100 pg/mL (for patients with an 
estimated glomerular filtration rate of 15-60 mL/min/1.73 m 2 , a 


Box 16-2 Factors That Can Affect BNP Levels 3 

FACTORS (OTHER THAN HEART FAILURE) 

THAT CAUSE ELEVATED BNP LEVELS 

• Advanced age 

• Renal failure 

• Acute coronary syndromes 

• Lung disease with cor pulmonale 

• Acute large pulmonary embolism 

• High-output cardiac states 

FACTORS THAT DECREASE BNP IN THE SETTING OF HEART FAILURE 

• Acute pulmonary edema 

• Stable New York Heart Association class I patients with 
low EF 

• Acute mitral regurgitation 

• Mitral stenosis 

• Atrial myxoma 

“Adapted from Maisel. 70 


threshold of 201 pg/mL can be used). However, the clinician 
must always remember to first quantify and acknowledge his or 
her clinical suspicion according to the clinical examination 
before interpreting the BNP result. 

In the subgroup of ED patients with a history of asthma or 
COPD, the features that strongly suggested a diagnosis of heart 
failure were the overall clinical assessment, a third heart sound, 
radiographic edema or cardiomegaly, and electrocardiographic 
atrial fibrillation. The features that suggested the diagnosis was 
not heart failure were normal chest radiograph result and a low 
serum BNP level (<100 pg/mL). However, these results are from 
a subgroup analysis in 1 study and require confirmation. 

Although the findings of this study are useful when dyspneic 
patients suspected of having heart failure were assessed, no indi¬ 
vidual feature is sufficiently powerful in isolation to rule heart 
failure in or out. Therefore, an overall clinical impression based 
on all available information is best. If the appropriate constella¬ 
tion of findings with high LRs for heart failure is present, that 
may be sufficient to warrant empirical treatment without fur¬ 
ther urgent investigations. Conversely, if the clinical suspicion of 
heart failure is low (eg, pulmonary disease), the physician 
should investigate and treat other causes of dyspnea. 


CLINICAL SCENARIOS—RESOLUTION 


CASE 1 The patient has many features that raise the suspi¬ 
cion of heart failure, such as previous myocardial infarction 
(LR, 3.1), previous heart failure (LR, 5.8), orthopnea (LR, 

2.2) , paroxysmal nocturnal dyspnea (LR, 2.6), elevated jugu¬ 
lar venous pressure (LR, 5.1), a third heart sound (LR, 11.0), 
rales (LR, 2.8), extremity edema (LR, 2.3), cardiomegaly (LR, 

3.3) , and atrial fibrillation (LR, 3.8) and only the single fea¬ 
ture of wheezing (LR, 0.52) that decreases the suspicion 
slighdy. The overall constellation of symptoms and signs is 
so suggestive of heart failure that additional testing is not 
needed to make the diagnosis. 
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CASE 2 Both heart failure and obstructive airways disease 
are considerations. The symptoms of dyspnea on exertion 
and cough are not helpful in making a diagnosis of heart fail¬ 
ure because their LRs are close to 1. Rales (LR, 2.8) and ECG 
showing ST depression (LR, 1.7) both increase the likelihood 
of heart failure, but more important, the findings of pulmo¬ 
nary venous congestion and interstitial edema are both asso¬ 
ciated with large LRs (>10) that significantly increase the 
suspicion for heart failure. Wheezing reduces the likelihood 
somewhat (LR, 0.52). According to the information avail¬ 
able, the patient likely has acute heart failure and should be 
treated without waiting for further tests. An ECG should be 
ordered nonurgently. Furthermore, the patient may also be 
having a superimposed COPD or asthma exacerbation. The 
physician should consider ordering pulmonary function tests 
to confirm a diagnosis of obstructive airways disease. 

CASE 3 There are some features that increase the likelihood 
of heart failure, such as history of myocardial infarction 
(LR, 2.2), elevated jugular venous pressure (LR, 4.3), lower 
extremity edema (LR, 2.7), and Q waves (LR, 3.1), whereas 
other features decrease the likelihood (wheezing, LR, 0.85; 
and normal chest radiograph, (LR, 0.11). According to these 
LRs, there are insufficient data to make or rule out a diagnosis 
of heart failure. In this case, a BNP level could be helpful. If it 
were less than 100 pg/mL, heart failure would be unlikely 
(LR, 0.09). If it were elevated, the probability of heart failure is 
higher but not diagnostic. More urgent echocardiogram and 
pulmonary function studies would be appropriate next steps. 
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Prepared by Robert G. Badgett, MD, 
and Catherine R. Lucey, MD 

Reviewed by Najib Ayas, MD, and Sheri Keitz, MD 


CLINICAL SCENARIO 


A 68-year-old patient with a history of smoking, dyslipide- 
mia, and claudication presents for a follow-up appointment. 
He has no cardiac symptoms. Although he has no history of a 
myocardial infarction, he has anterior Q waves on an electro¬ 
cardiogram (ECG). You believe him to be at high risk for 
asymptomatic left ventricular dysfunction and that a change 
in his medications might decrease the risk for symptomatic 
heart failure. Does he have a reduced ejection fraction? 

UPDATED SUMMARY ON LEFT 
VENTRICULAR DYSFUNCTION 

Original Reviews 

Badgett RG, Lucey CR, Mulrow CD. Can the clinical exami¬ 
nation diagnose left-sided heart failure in adults? JAMA. 
1997;277(21): 1712-1719. 

UPDATED REVIEW 

Wang CS, FitzGerald JM, Schulzer M, Mak E, Ayas NT. Does 
this dyspneic patient in the emergency department have con¬ 
gestive heart failure? JAMA. 2005;294( 15):1944-1956. 

UPDATED LITERATURE SEARCH 

Our literature search used the search strategy for The Rational 
Clinical Examination articles in OVID MEDLINE, combined 
with the terms “exp congestive heart failure/” or “heart fail- 
ure.tw.,” or “exp ventricular dysfunction/” or “ventricular dys- 
function.tw.,” limited to original human and English-language 
diagnostic articles published from January 1997 to November 
2004. Though the original publication included studies of the 
clinical evaluation of elevated left ventricular diastolic pressure 
(as measured by the pulmonary capillary wedge pressure), the 
update focused only on identifying patients with a low ejection 
fraction (EF) (systolic dysfunction) or distinguishing patients 
with systolic dysfunction from those with diastolic dysfunc¬ 
tion. We included studies that evaluated outpatients or inpa¬ 
tients while excluding studies of emergency department 
patients with acute dyspnea. The patient with dyspnea who 


presents to the emergency deparment is reviewed in the recent 
Rational Clinical Examination article noted above. 

We reviewed 1005 citations and retrieved 15 promising cita¬ 
tions to see whether they had sensitivity, specificity, likelihood 
ratio (LR) data, or a multivariable analysis of the clinical exami¬ 
nation for systolic or diastolic dysfunction compared with a refer¬ 
ence standard test. Of 9 articles with appropriate data, only 5 were 
of new original data and of high enough quality for inclusion. 

NEW FINDINGS 

• In all settings, new studies confirm that symptoms, signs, 
and risk factors for left ventricular systolic dysfunction 
should be interpreted with electrocardiogram (ECG) and 
chest radiograph results. 

• After a myocardial infarction, the clinical evaluation cannot 
rule out systolic dysfunction and all of these patients should 
have their EF measured. The presence of a third heart sound, 
rales, pulmonary venous congestion on chest radiograph, or 
an anterior Q wave identifies patients most likely to have an 
EF of 40%. 

• For nonemergency outpatients, the absence of clinical find¬ 
ings (orthopnea, paroxysmal nocturnal dyspnea, rales, third 
heart sound, jugular venous distention), chest radiograph 
findings, and ECG abnormalities makes the diagnosis of sys¬ 
tolic dysfunction unlikely. When these normal results are 
combined with a brain natriuretic peptide (BNP) level lower 
than 37 pg/mL, systolic dysfunction is even less likely. 

• Because of verification bias in existing studies, the presence 
of increasing numbers of symptoms and signs may be better 
than previously thought for identifying patients with an 
increasing likelihood of systolic dysfunction. The addition of 
a BNP assay to increasing numbers of abnormal symptoms 
and signs does not add much to the clinical evaluation. 

Details of the Update 

Since our initial review in 1997, the goals have changed for 
the clinical evaluation of patients with heart failure. Recent 
US and European guidelines emphasize the use of symptoms 
alone to titrate many medicines for heart failure. However, 
initiation of optimal pharmacologic management to mini¬ 
mize morbidity and extend life is predicated on identifying 
patients with a low EF. 
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Diagnosis of Systolic Dysfunction 

In our 1997 review, we found that the clinical evaluation had a 
positive likelihood ratio (LR+) of 2.5 and a negative likelihood 
ratio (LR-) of 0.2 for detecting decreased ER Three new stud¬ 
ies highlight the clinical considerations and methodologic dif¬ 
ficulties when studying the role of the clinical examination. 

An important methodologic problem presents itself with the 
selection of patients and how the clinical examination is 
reported. Most studies accept that individual physical exami¬ 
nation findings (eg, rales, a third heart sound, or jugular 
venous distention) are used best in combination. Thus, studies 
typically evaluate the overall clinical diagnosis of heart failure, 
allowing the clinician to consider all the patient’s symptoms 
and signs, along with ECG and chest radiograph abnormali¬ 
ties. Alternatively, studies may evaluate explicit clinical criteria 
(from a list of findings) or the performance of a multivariable 
model derived from the clinical data. Another problem is the 
varying choice in the cut point for combining findings; some 
approaches seek to maximize accuracy, and other approaches 
optimize the LR- to ensure that few patients with left ventric¬ 
ular dysfunction escape detection. 

Postmyocardial Infarction 

One new study evaluated patients after a myocardial infarc¬ 
tion. 1 The study had distinctly different results when 2 US cen¬ 
ters, where the clinical findings were recorded on admission to 
a cardiac care unit and then twice daily (incidence of left ven¬ 
tricular systolic dysfunction, 39%), were compared with a 
Scottish center, where the clinical assessment was done just 
once in the morning after admission (incidence of systolic dys¬ 
function, 60%). A clinical diagnosis of systolic dysfunction 
after myocardial infarction should be based on twice-daily 
examinations because the evaluation made the morning after 
admission to the hospital had no diagnostic value (LR+, 0.86; 
LR-, 1.0). On the other hand, twice-daily assessments using 
predefined criteria for heart failure had an LR+ of 3.1 (95% 
confidence interval [Cl], 1.7-5.8) and an LR- of 0.62 (95% Cl, 
0.46-0.83). The clinical criteria were pulmonary rales one- 
third up the lung fields in the absence of chronic lung disease, 
a third heart sound, or radiographic evidence of pulmonary 
venous congestion. Anterior Q waves do not occur in most 
infarctions, but their presence is important (LR for anterior Q 
waves, 5.0; 95% Cl, 2.2-11). The inability to “rule out” left ven¬ 
tricular systolic dysfunction by clinical diagnosis alone sup¬ 
ports the role of routine objective assessment of the EF after 
myocardial infarction. However, when an examination that 
does not support heart failure is combined with the absence of 
anterior Q waves, no previous myocardial infarction, and a 
peak creatinine kinase level less than 1000 U/L in the absence 
of thrombolytic therapy, the LR for systolic dysfunction 
decreases to 0.11 (95% Cl, 0.04-0.29). 

Patients Referred for Echocardiograms 

Two studies evaluated consecutive patients referred to an echo¬ 
cardiography laboratory solely for the determination of the sys¬ 
tolic EF. One study 2 assessed the clinical diagnosis in inpatients, 
whereas the other study assessed referred outpatients. 3 The 
patients in these studies represent those who might be referred 


by general internists, but they do not present the entire spec¬ 
trum because the study population excludes the patients for 
whom the physician used the clinical findings and medical his¬ 
tory to determine that an echocardiogram was not needed. 
This has 2 important implications for clinicians. First, the 
results of these studies provide an incomplete picture of how 
patients should be prospectively identified for referral to echo¬ 
cardiography. However, the results could be used to help decide 
the likelihood of an abnormal result once the physician refers 
the patient for an objective assessment of systolic function. For 
example, an echocardiographer might use the information to 
identify patients for whom the likelihood of systolic dysfunc¬ 
tion is so low that an echocardiogram could be deferred. 

Second, these studies are affected by verification bias in 
which not all patients who were considered as possibly hav¬ 
ing systolic dysfunction were evaluated. Typically, verifica¬ 
tion bias leads to an underestimation of the LR+ (ie, the 
clinical findings are actually much better at detecting affected 
patients than the investigators report) while overestimating 
the efficiency of the LR- (ie, the clinical findings are not as 
good at identifying the patients with a normal EF result). 

The first study was conducted on inpatients who had a 41% 
prevalence of systolic dysfunction. The overall clinical diagnosis 
for inpatients was based on a combination of clinical findings, 
the ECG, and the chest radiograph. Using all these features, a 
clinical diagnosis of heart failure had an LR of 2.0 (95% Cl, 
1.6-2.5) for systolic dysfunction vs an LR of 0.41 (95% Cl, 
0.30-0.56) when the clinician believed the patient did not have 
heart failure. Although the physical examination findings of 
rales, a third heart sound, and jugular venous distention were 
useful, a multivariate model showed that none of them had 
independent utility when the patient’s sex, chest radiograph, 
and ECG were considered. Once clinicians have decided to refer 
a patient for measurement of the systolic EF, they should be 
aware that those with a normal ECG result are unlikely to have 
an EF lower than 45% (LR, 0.03; 95% Cl, 0.01-0.10). 

A second study evaluated consecutive outpatients referred 
specifically for assessment of systolic function. The prevalence 
of systolic dysfunction was 11%. The study combined sympto¬ 
matic patients with asymptomatic patients who had risk fac¬ 
tors. According to the characteristics of the referred patients, 
the investigators established a clinical score that required the 
presence of at least 1 abnormality: a history of myocardial 
infarction, previous diagnosis of heart failure, orthopnea or 
paroxysmal nocturnal dyspnea, Q-wave or intraventricular 
conduction delay, or a chest radiograph abnormality (cardio- 
megaly, pulmonary venous hypertension, or edema). The 
presence of at least 1 abnormality (LR+, 2.5; 95% Cl, 2.2-2.9) 
worked as well as the clinical diagnosis for identifying hospi¬ 
talized patients with an EF of less than 45%. Patients with 
none of the findings were unlikely to have a low EF (LR, 0.09; 
95% Cl, 0.03-0.28). The authors also assessed the effect of a 
normal BNP at cut point of 37 pg/mL (the mean value in 
healthy persons). At this threshold, the BNP performed simi¬ 
larly to the clinical score and was not independently useful. A 
BNP of 37 pg/mL or higher combined with an abnormal clini¬ 
cal score increases the likelihood of a low EF (LR, 3.9; 95% Cl, 
3.0-5.0). The only important utility of the BNP at this cut 
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point, when combined with the clinical score, was in identify¬ 
ing patients unlikely to have an abnormal EF result. A normal 
clinical score and a normal BNP level decrease the likelihood 
of systolic dysfunction, with an LR of 0.04 (95% Cl, 0.01- 
0.30). Patients with a normal score still had a decreased likeli¬ 
hood of a low EF even when the BNP level was elevated (LR, 
0.23; 95% Cl, 0.06-0.92). The investigators did not assess the 
utility of the BNP at various cut points, but a higher BNP 
threshold would improve the LR+ at the expense of the LR-, 
making “normality” more difficult to identify. 

Distinguishing Diastolic From Systolic Dysfunction 
Among Patients With Known Heart Failure 

Two new studies support our original recommendation that 
the presence of hypertension has some utility in distinguishing 
heart failure patients with systolic dysfunction from those with 
a normal EF result (diastolic dysfunction). However, none of 
the individual findings either increases the likelihood of dia¬ 
stolic dysfunction more than 2-fold or decreases the likelihood 
of systolic dysfunction by more than one-half. Female sex and 
hypertension (systolic blood pressure >160 mm Hg) make 
diastolic dysfunction more likely, whereas tachycardia (heart 
rate > 100/min) and left atrial abnormality on the ECG make 
systolic dysfunction more likely. 4 Unfortunately, multivariate 
assessment with a large number of candidate variables creates 
a complicated regression model (18 variables) that does not 
appreciably improve on these few individual findings when 
validated independently. 5 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The results for the overall clinical evaluation for systolic dysfunc¬ 
tion reported in this update should supplant the summary esti¬ 
mates given in Table 16-6 of our original review. Although we 
showed summary measures for the performance of the clinical 
examination in the postinfarction patient, the reported data 
came from studies with various enrollment methods, timing of 
the EF assessment, and variable thresholds accepted as indicating 
systolic dysfunction. The data presented in this update confirm 
the utility of a clinical diagnosis of heart failure but highlight the 
inability of the clinical evaluation to confirm that the postinfarc¬ 
tion patient has a normal EF result. For patients electively 
referred for echocardiography, temporal changes in management 
of the heart failure patient coupled with an increasing awareness 
of the limits of clinical diagnosis have likely changed the spec¬ 
trum of patients undergoing determination of systolic function. 
Although the results of this update do not yield dramatically dif¬ 
ferent LRs, the results are nonetheless more contemporary and 
applicable to the current care of heart failure patients. 

CHANGES IN THE REFERENCE STANDARD 

No changes in measurement of the EF of left ventricular filling 
pressure have been advocated. However, 2 recent studies suggest 
the EF does not identify early ventricular dysfunction and that 


changes in the reference standard may be needed. First, in an 
analysis of the Framingham subjects without heart failure, 
echocardiographic evidence of increased left ventricular vol¬ 
ume, independent of the fractional shortening, predicted subse¬ 
quent clinical episodes of heart failure. 6 Likewise, the BNP was 
found to improve on the EF and clinical signs of heart failure in 
predicting mortality among patients with coronary disease. 7 

RESULTS OF LITERATURE REVIEW 

See Tables 16- 0 and 16-11. 

EVIDENCE FROM GUIDELINES 

The American College of Cardiology advocates that clini¬ 
cians consider heart failure a syndrome that progresses 
from an asymptomatic state among patients with risk fac¬ 
tors to symptoms of heart failure. Patients with symptoms 
should receive a physical examination, chest radiography, 


Table 16-10 Diagnosing Left Ventricular Dysfunction 

Finding Setting (EF) 

LR+ (95% Cl) LR- (95% Cl) 

Clinical diagnosis 3 After Ml (EF < 40%) 

3.1(17-5.8) 0.62(0.46-0.83) 

Clinical diagnosis Inpatients (EF < 45%) 

2.0 (1.6-2.5) 0.41 (0.30-0.56) 

Clinical score > 1 b Outpatients (EF < 45%) 

2.5 (2.2-2.9) 0.09 (0.03-0.28) 

Clinical score Outpatients (EF < 45%) 

and BNP 

Score > 1 + BNP > 37 pg/mL 

3.9 (3.0-5.0) 

Score > 1 + normal BNP 

1.1 (0.61-2.0) 

Score < 1 + BNP > 37 pg/mL 

0.23 (0.06-0.92) 

Score < 1 + normal BNP 

0.04 (0.01-0.30) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; EF, ejection fraction; 
LR+, positive likelihood ratio; LR—, negative likelihood ratio; Ml, myocardial infarction. 
Twice-daily examinations after an Ml. Clinical diagnosis “positive” required radio- 
graphic pulmonary venous congestion with edema, rales one-third up the lung fields in 
the absence of chronic pulmonary disease, or a third heart sound. 

"Score of 1 or more when any of the following 5 findings are present: a history of myo¬ 
cardial infarction, previous diagnosis of heart failure, orthopnea or paroxysmal noctur¬ 
nal dyspnea, Q-wave or intraventricular conduction delay, or a chest radiograph 
abnormality (cardiomegaly, pulmonary venous hypertension, or edema). 


Table 16-11 Distinguishing Diastolic Dysfunction From Systolic 
Dysfunction 


LR for Diastolic LR for Systolic 
Finding Dysfunction Dysfunction (EF < 45%) 


Favors Normal Systolic Function 

Female sex 

1.6 (1.2-2.2) 

0.62 (0.46-0.84) 

Systolic blood pressure 
> 160 mm Hg 

1.8(1.3-2.6) 

0.55 (0.39-0.78) 

Favors Systolic Dysfunction 

Heart rate > 100/min 

0.43 (0.28-0.65) 

2.3(1.5-3.5) 

Left atrial ECG abnormality 

0.42 (0.26-0.63) 

2.4 (1.6-3.6) 


Abbreviations: ECG, electrocardiogram; EF, ejection fraction; LR, likelihood ratio. 
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and electrocardiography and have ongoing clinical assess¬ 
ment for volume status (level of evidence: C). However, the 
echocardiogram is the single most useful test and is required 
for identifying patients with systolic dysfunction. 8 

The European Society of Cardiology suggests a slightly dif¬ 
ferent approach, advocating the addition of BNP testing to 
medical history, physical examination, ECG, and the chest 
radiograph. 9 When any of these results are abnormal, echo¬ 
cardiography is recommended. Although they acknowledged 
that heart failure is unlikely with a completely normal ECG 
result, the Society also notes the poor relationship between 
symptoms, signs, and the actual EF that makes the echocar¬ 
diogram a necessary test. 


CLINICAL SCENARIO—RESOLUTION 


The patient has no symptoms of left ventricular dysfunc¬ 
tion, but he does have increased risk of cardiovascular dis¬ 
ease, as evidenced by his claudicatory symptoms. The 
presence of anterior Q waves on the ECG suggests he may 
have had a silent myocardial infarction, putting him at 
risk for the evolution of heart failure. Anterior Q waves 
have a sufficiently high LR for identifying patients with a 
low EF that an echocardiogram should be obtained. 
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LEFT VENTRICULAR DYSFUNCTION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

A broad range of prior probabilities (10%-40%) is 
required for clinical decisions, with outpatients who have 
suggestive symptoms at the lower end of the range and 
inpatients toward the upper end. The patient with dysp¬ 
nea who presents to the emergency department without 
an obvious cause for dyspnea has about a 50% probability 
of left ventricular dysfunction (range, 34%-83%). 

POPULATION FOR WHOM LEFT VENTRICULAR 
DYSFUNCTION SHOULD BE CONSIDERED 

• Any patient with compatible symptoms, especially 
orthopnea and paroxysmal nocturnal dyspnea 

• Coronary artery disease, especially patients who have 
had a myocardial infarction 

• Hypertension 

• Diabetes mellitus 

• Patient receiving cardiotoxic medications 

• Family history of cardiomyopathy 

DETECTING THE LIKELIHOOD OF 
LEFT VENTRICULAR DYSFUNCTION 

Patients with symptoms of heart failure and those with 
risk factors should be examined for pulmonary rales, jug¬ 
ular venous distention, a third heart sound, and periph¬ 
eral edema and should have an ECG and chest radiograph 

(see Table 16-12). 

REFERENCE STANDARD TESTS 

An objective measure of systolic dysfunction, typically 
echocardiography but including nuclear cardiology or 
cardiac catheterization. 


Table 16-12 Likelihood Ratios for Diagnosis of Left Ventricular Dysfunction 

Medical Inpatients, Including 
Postmyocardial Infarction 

LR+ or Range 
(95% Cl) 

LR- or Range 
(95% Cl) 

Clinical diagnosis" 

2.0-3.1 

0.41-0.62 

ECG abnormal" 

2.8 (2.3-3.4) 

0.03(0.01-0.10) 

Outpatients 

Clinical score" with a BNP > 37 pg/mL 

Score > 1 + elevated BNP 

3.9 (3.0-5.0) 

Score > 1 + normal BNP 

1.1 (0.61-2.0) 

Score < 1 + elevated BNP 

0.23 (0.06-0.92) 

Score < 1 + normal BNP 

0.04 (0.01-0.30) 

The Breathless Emergency Patient 

LR + (95% Cl) 

LR - (95% Cl) 

Patient History 

Heart failure 

5.8 (4.1-8.0) 

0.45 (0.38-0.53) 

Myocardial infarction 

3.1 (2.0-4.9) 

0.69 (0.58-0.82) 

Physical Examination 

Third heart sound 

11 (4.9-25) 

0.88 (0.83-0.94) 

Abdominojugular reflux 

6.4 (0.81-51) 

0.79(0.62-1.0) 

Jugular venous distention 

5.1 (3.2-7.9) 

0.66 (0.57-0.77) 

Rales 

2.8 (1.9-4.1) 

0.51 (0.37-0.70) 

Chest Radiograph 

Pulmonary venous congestion 

12(6.8-21) 

0.48 (0.28-0.83) 

Interstitial edema 

12(5.2-27) 

0.68 (0.54-0.85) 

Alveolar edema 

6.0(2.2-16) 

0.95 (0.93-0.97) 

Cardiomegaly 

3.3 (2.4-4.7) 

0.33 (0.23-0.48) 

Pleural effusion(s) 

3.2 (2.4-4.3) 

0.81 (0.77-0.85) 

Electrocardiogram 

Atrial fibrillation 

3.8 (1.7-8.8) 

0.79 (0.65-0.96) 

NewT-wave changes 

3.0(1.7-5.3) 

0.83 (0.74-0.92) 

Any abnormal finding 

2.2 (1.6-3.1) 

0.64 (0.47-0.88) 

B-Type Natriuretic Peptide, pg/mL d 

>250 

4.6 (2.6-8.0) 


>100 

2.7 (2.0-3.9) 


>50 

1.7 (1.2-2.6) 


<50 

0.06(0.03-0.12) 


Overall Clinical Impression 

Initial clinical judgment that the patient 
is in heart failure 

4.4 (1.8-10) 

0.45 (0.28-0.73) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; ECG, electrocardio¬ 
gram; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

"Clinical symptoms, physical examination, ECG, and chest radiograph. 

"Ventricular hypertrophy, ST-segment orT-wave changes, left bundle-branch block, or a 
paced rhythm. All other findings were considered “normal,” for which the LR- applies. 

"Score of 1 or higher when any of the following 5 findings are present: a history of myo¬ 
cardial infarction, previous diagnosis of heart failure, orthopnea or paroxysmal nocturnal 
dyspnea, Q-wave or intraventricular conduction delay, or a chest radiograph abnormality 
(cardiomegaly, pulmonary venous hypertension, or edema). 

"The likelihood ratios (LRs) represent serial LRs, where the LRs are associated with a series 
of ordered BNP thresholds rather than just a single BNP threshold. In this table, a patient 
with a BNP of 110 pg/mL would have an LR of 2.7, whereas a value of 33 pg/mL confers 
an LR of 0.06. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Congestive Heart Failure 



TITLE Diagnosing Left Ventricular Dysfunction After 
Myocardial Infarction: The Dundee Algorithm. 

AUTHORS Darbar D, Gillespie N, ChoyAM, et al. 

CITATION Q/ Med. 1997;90(ll):677-683. 

QUESTION How well can a clinical algorithm identify 
patients, after an acute myocardial infarction, with a left 
ventricular ejection fraction of 40% or less? 

DESIGN Prospective data collection, but no information 
on whether the patients were consecutively enrolled. 

SETTING A university hospital and its Veterans Affairs 
affiliate medical center (Nashville, Tennessee) and a Scot¬ 
tish University hospital (Dundee). 

PATIENTS Patients were evaluated while hospitalized 
for acute myocardial infarction. Exclusion criteria 
included patients who had a history of congestive heart 
failure, were already taking angiotensin-converting 
enzyme inhibitors at admission, or underwent primary 
angioplasty within 24 hours of their current infarction. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The diagnostic algorithm required at least 1 of the following: 
(1) clinical findings of heart failure, (2) new infarction with 
anterior Q waves, or (3) previous infarction, with the current 
infarction exhibiting a peak creatinine kinase (CK) level of 
more than 1000 U/L without thrombolysis. 

The heart failure clinical diagnosis used the criteria from 
a previous randomized controlled trial and required any of 
the following: (1) pulmonary venous congestion with 
edema on at least 1 chest radiograph, (2) rales extending at 
least one-third up the lung fields in the absence of chronic 
pulmonary disease, or (3) a third heart sound with persis¬ 
tent tachycardia. 1 

In the US centers, the clinical diagnosis could have been 
established at admission of the patient or on twice-daily 
rounds with a cardiology fellow or attending physician. The 
Scottish center used the clinical assessment by a consultant 
cardiologist on the morning after admission. 


An investigator interpreted the electrocardiograms (ECGs) 
independent of the clinical diagnosis. However, the clinicians 
establishing the clinical diagnosis had access to the ECG 
result and the cardiac enzyme levels. 

MAIN OUTCOME MEASURES 

The UK site used transthoracic echocardiography only. The 
US site used transthoracic echocardiography, contrast ven¬ 
triculography, or radionuclide ventriculography. Left ventric¬ 
ular systolic dysfunction was defined by an ejection fraction 
(EF) of 40% or less. 

MAIN RESULTS 

A total of 46 (39%) US patients had left ventricular systolic 
dysfunction vs 56 (60%) Scottish patients (see le 16-13). 


Table 16-13 Likelihood Ratios for the Overall Algorithm and Its 
Components 

Finding (No. With the 


Finding Present) 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical Diagnosis 

US centers (33) 

3.1 (1.7-5.8) 

0.62 (0.46-0.83) 

UK center (25) 

0.86(0.44-1.7) 

1.0 (0.82-1.4) 

Anterior Q Wave 

US center (13) 

8.6 (2.0-37) 

0.78 (0.66-0.92) 

UK center (27) 

3.9 (1.5-10) 

0.66 (0.52-0.84) 

Summary 

5.0 (2.2-11) 

0.74 (0.64-0.85) 

CK > 1000 U/L 

US center (52) 

0.72 (0.51-1.0) 

1.5 (1.0-2.1) 

UK center® (52) 

0.73 (0.51-1.0) 

1.5(0.91-2.5) 

Summary 

0.72 (0.57-0.92) 

1.5 (1.1-2.0) 

Diagnostic “Algorithm” 8 Result Positive 

US center (58) 

4.1 (2.6-6.4) 

0.11 (0.04-0.29) 

UK center 6 (57) 

2.8 (1.7-4.7) 

0.25 (0.14-0.46) 


Abbreviations: Cl, confidence interval; CK, creatinine kinase; LR+, positive likelihood 
ratio; LR—, negative likelihood ratio. 

a Any of the following present: (1) clinical findings of heart failure, (2) new infarction 
has anterior Q waves, or (3) previous infarction and the current infarction exhibited a 
peak CK of more than 1000 U/L without thrombolysis. 

"Data corrected from that originally reported in this study. 
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TITLE Usefulness of Clinical Information to Distinguish 
Patients With Normal From Those With Low Ejection 
Fractions in Heart Failure. 

AUTHORS Philbin EF, Hunsberger S, Garg R, et al; for 
the Digitalis Investigation Group. 

CITATION AmJ Cardiol. 2002;89(10): 1218-1221. 

QUESTION Can the clinical evaluation diagnose pa¬ 
tients with diastolic dysfunction among patient with 
chronic heart failure? 

DESIGN Prospective with random allocation to deriva¬ 
tion and validation groups. 

SETTING A total of 302 clinical centers in the United 
States and Canada participating in the Digitalis Investiga¬ 
tion Group Trial. 

PATIENTS A total of 7534 patients with stable symp¬ 
toms caused by ischemic, hypertensive, idiopathic, or 
alcohol-related chronic heart failure who were in sinus 
rhythm. Ten percent of patients had missing values for 
any predictor variables and were excluded from multivari¬ 
ate analyses. Patients were randomly assigned to either a 
derivation group or validation group. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3 (uncertainty regarding con¬ 
secutive enrollment). 

STRENGTHS The diagnostic criteria were evaluated in 2 
sites and highlighted the potential variability in performance 
of the clinical examination. However, an important feature 
was the explicit clinical criteria for a heart failure diagnosis 
that had been used in previous clinical trials. 

LIMITATIONS Although the electrocardiograph and CK cri¬ 
teria were collected independently of the clinical evaluation, 
the clinicians had access to the ECG result and CK enzyme 
levels. There was a difference in prevalence of left ventricular 
systolic dysfunction, but more important, there was a dis¬ 
tinct difference in the timing and frequency of the clinical 
evaluations. The US clinicians had more opportunities to 
detect heart failure than the Scottish physicians. 

Although it seems most likely that the difference in the 
clinical diagnosis could be attributed to the timing and fre¬ 
quency of clinical assessments, it is not possible to rule out a 
difference in patients (eg, more anterior Q-wave infarctions 
in Scotland) or an actual difference in examiners’ skills. It 
does seem clear that the CK criterion cannot be used to iden¬ 
tify patients with left ventricular systolic dysfunction. An 
anterior Q wave might be useful, but the confidence intervals 
are broad. 

The proposed “algorithm” is actually a heart failure diag¬ 
nosis that required the presence of one of 3 findings. In the 
US center, the combination of findings was more efficient 
than the clinical diagnosis for identifying patients less likely 
to have heart failure. In the Scottish center, the criteria added 
to the clinical diagnosis resulted in a much more accurate 
finding. If these data are reliable, then we could conclude that 
patients undergoing twice-daily clinical assessments (with 
chest radiographs as part of the criteria) without evidence of 
heart failure after their first nonanterior Q-wave myocardial 
infarction are much less likely to have a low EF. The nature of 
the study design requires that this conclusion undergo vali¬ 
dation, but even with validation the clinical performance 
would have to be much better to forgo echocardiographs 
when the clinician wants to know whether the EF is 40% or 
lower. 


REFERENCE FOR THE EVIDENCE 

1. Acute Infarction Ramipril Efficacy (AIRE) Study Investigators. Effect of 
ramipril on mortality and morbidity of survivors of acute myocardial infarc¬ 
tion with clinical evidence of heart failure. Lancet. 1993;342(8875):821-828. 

Reviewed by Robert G. Badgett, MD, 
and Catherine R. Lucey, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Candidate findings were from the medical history, physical 
examination, and radiographic evaluation. Ejection fraction 
(EF) was measured with angiography, radionuclide ventricu¬ 
lography, or 2-dimensional echocardiography. 

MAIN OUTCOME MEASURES 

A multivariate model was used to calculate the predicted EF 
in a derivation set of patients (n = 3768). The model was 
then applied to a separate validation set of patients (n = 
3766) to estimate the accuracy of the prediction and the abil¬ 
ity to detect patients with a normal EF (>45%) vs a low EF 
(<45%). 

MAIN RESULTS 

Eighteen findings were independently significant, including 
previously identified findings of female patient, older age, 
and having a smaller cardiothoracic ratio on chest radio¬ 
graph. The findings included symptom functional class, 
rales, jugular venous distention, peripheral edema, a third 
heart sound, heart rate, and blood pressure, along with sev¬ 
eral historical features and medical use. Despite the large 
sample size and large number of variables, the model pre¬ 
dicted an EF that was within 5% of the actual result in only 
45% of patients. 
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The investigators evaluated the ability of the model to clas¬ 
sify patients correctly at various EF thresholds. Identifying 
patients with systolic dysfunction improved when a cut point 
for the predictive model was 30%. In other words, when the 
model predicted an EF of less than 30%, the likelihood ratio 
(LR) that the patient has systolic dysfunction (<45%) is 3.9 
(95% confidence interval [Cl], 2.9-5.3). Although patients 
with an estimated EF of 30% or higher were less likely to have 
systolic dysfunction, the condition was not ruled out (LR, 
0.65; 95% Cl, 0.62-0.69). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large study that analyzed many variables to 
create a predictive model. 

LIMITATIONS Electrocardiographic findings were not stud¬ 
ied. Sensitivity and specificity of individual findings were not 
reported. 

Some of the findings that entered the final model were 
those advocated when assessing the patient with heart failure. 
However, a complicated model with a large number of find¬ 
ings was too inaccurate for clinical prediction of the actual 
EF. In addition, the model was insufficient for sorting out 
patients with systolic vs diastolic dysfunction. 

Reviewed by Robert G. Badgett, MD, 
and Catherine R. Lucey, MD 


TITLE Efficient Utilization of Echocardiography for the 
Assessment of Left Ventricular Systolic Function. 

AUTHORS Talreja D, Gruver C, Sklenar J, Dent J, Kaul S. 

CITATION Am Heart J. 2000;139(3):394-398. 

QUESTION Can a combination of clinical findings pre¬ 
dict ejection fraction (EF)? 

DESIGN Consecutive patients referred for echocardiog¬ 
raphy. Data were recorded prospectively. 

SETTING Inpatient echocardiography laboratory. 

PATIENTS A total of 330 inpatients referred to echocar¬ 
diography specifically for evaluation of left ventricular 
systolic function. Thirty patients who did not have a 
required electrocardiogram (ECG) were excluded. The 
majority (91.5%) of patients had not had an anterior 
myocardial infarction. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The physical examination data and clinician’s diagnosis of 
heart failure were collected during a chart review. The chest 


radiograph information was abstracted from the radiologist’s 
report, with a focus on the presence of cardiomegaly or vas¬ 
cular congestion. The ECG was interpreted independently of 
the clinical data. A positive ECG result had to contain a Q 
wave in 2 or more contiguous leads, poor R-wave progres¬ 
sion, left ventricular hypertrophy, ST-segment or T-wave 
changes, left bundle-branch block, or a paced rhythm. 

An echocardiogram, done by an independent observer 
without access to the clinical data, determined whether there 
was left ventricular systolic dysfunction that was defined as 
an EF of less than 0.45. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity for the overall clinical examina¬ 
tion, chest radiograph, and ECG findings. 

MAIN RESULTS 

One hundred twenty-four (41%) patients had left ventricular 
systolic dysfunction. 

The clinical findings, ECG, and radiograph were used in 
the decision to refer for echocardiography. However, no indi¬ 
vidual finding occurred in more than 50% of patients (rales, 
third heart sound, jugular venous distention, peripheral 
edema, or a positive overall clinical diagnosis). See 
and 16-15. 

The patient’s sex (male) was the only other finding that 
was significant in the logistic model. 

Systolic dysfunction score = -127 + 130 (male patient) + 80 
(cardiomegaly) + 190 (left bundle-branch block) - 340 
(“normal” ECG result) 


Table 16-14 Likelihood Ratio of Overall Clinical Impression Among 
Patients Referred for Echocardiography 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Overall clinical impression 

2.0(1.6-2.5) 

0.41 (0.30-0.56) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

Table 16-15 Useful Predictors for a Low Ejection Fraction Identified in 
a Multivariable Model 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Radiographic cardiomegaly 

1.9(1.5-2.6) 

0.64 (0.52-0.79) 

Left bundle-branch block 

6.7 (2.3-19) 

0.87 (0.80-0.94) 

ECG result abnormal 3 

2.8 (2.3-3.4) 

0.03(0.01-0.10) 


Abbreviations: Cl, confidence interval; ECG, electrocardiogram; LR+, positive likeli¬ 
hood ratio; LR—, negative likelihood ratio. 

“Ventricular hypertrophy, ST-segment or T-wave changes, left bundle-branch block, 
or a paced rhythm was considered abnormal, so that the LR+ applies to these 
patients. Absence of all of these findings was considered as a “normal” result so 
that the LR- applies to these patients. 
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TITLE Utility of History, Physical Examination, ECG, and 
Chest Radiograph for Differentiating Normal From 
Decreased Systolic Function in Patients With Heart Failure. 

AUTHORS Thomas JT, Kelly RF, Thomas SJ, et al. 

CITATION Am }Med. 2002;112(6):437-445. 

QUESTION Can clinical findings differentiate normal 
vs decreased systolic left ventricular function in patients 
with heart failure? 

DESIGN Consecutive patients, without primary valvular 
disease, admitted with the primary diagnosis of congestive 
heart failure. 

SETTING Cook County Hospital, Chicago, Illinois. 

PATIENTS A total of 225 patients, of whom 46% had 
diastolic dysfunction. An additional 43 were excluded 
because their ejection fraction (EF) was not assessed dur¬ 
ing echocardiography. 


(If finding is present, substitute 1; if absent, substitute 0) 
Estimated probability = [exp (score/100) ]/[l + exp (score,100) ] 

CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Consecutive patients referred for an echocar¬ 
diogram to measure the EF. 

LIMITATIONS Verification bias exists in that the clinical 
findings (none of which was independently important) were 
used to select patients for echocardiography. This tends to 
overestimate sensitivity and underestimate specificity. 
Because the individual clinical findings were not collected in 
a structured way, we did not have confidence in their reliabil¬ 
ity and therefore did not include them in the evidence table. 
The systolic dysfunction score (calculated from the logistic 
model) cannot be used to identify patients who should be 
appropriately referred for echocardiography because it was 
derived from a group of patients whose physicians had 
already decided to determine the EF. Despite the data collec¬ 
tion method, it is striking that rales, third heart sounds, jug¬ 
ular venous distention, and peripheral edema had no 
independent utility among patients referred to the echocar¬ 
diography laboratory. 

These data are most useful to the echocardiographer, who 
might choose to forgo testing for patients with a low proba¬ 
bility of a reduced EF. For example, a man with no cardio- 
megaly and none of the abnormal ECG findings who is 
referred for EF determination has only a 3% probability of an 
EF lower than 0.45 (a woman has a probability of only 1%). 

Given the selection bias in how patients were identified for 
echocardiography, the likelihood ratio (LR) for the clinical 
diagnosis of heart failure (LR, 2.0) could actually be much 
higher. This phenomenon might be even more striking for 
the individual physical examination findings if patients with 
abnormal results were preferentially referred. When clini¬ 
cians do not clinically diagnose heart failure, but refer the 
patient for EF determination, the LR is not likely to be as low 
as the LR of 0.41 found in this study. 

Reviewed by Catherine R. Lucey, MD, 
and Robert Badgett, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Clinical findings were extracted by a chart review of the 
attending physician’s notes from hospital day 1 or 2. Some 
attending physicians may have been aware of a patient’s 
echocardiographic results, making the symptoms and sub¬ 
jective physical examination unsuitable for review. However, 
the vital signs were not biased by awareness of the systolic 
function. The electrocardiogram (ECG) and chest radio¬ 
graphs were collected and interpreted without awareness of 
the echocardiogram. 

Left ventricular systolic function was determined by echo¬ 
cardiography performed by experienced cardiologists 
blinded to clinical findings. Normal systolic function was 
defined as an EF of more than 45% as assessed by visual 
inspection. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity from univariate analysis and 
adjusted odds ratios from multivariate analysis. 


MAIN RESULTS 

One hundred four patients had normal systolic function and 
121 had systolic dysfunction (EF < 45%). A multivariate analy¬ 
sis of 34 findings identified only a few significant variables 
from the clinical examination, none of which was physical 
examination findings other than vital signs. See 
The chest radiograph findings (cardiomegaly, cephaliza- 
tion, pulmonary edema, pleural effusion) did not distinguish 
between individuals with normal systolic function and with 
systolic dysfunction. 
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TITLE Clinical Criteria and Biochemical Markers for the 
Detection of Systolic Dysfunction. 

AUTHORS Yamamoto K, Burnett JC, Bermudez EA, 
Jougasaki M, Bailey KR, Redfield MM. 

CITATION } Card Fail. 2000;6(3):194-200. 

QUESTION Does a clinical score with items from the 
medical history, electrocardiogram (ECG), and chest 
radiograph predict left ventricular systolic dysfunction 
better when added to the information from a brain natri¬ 
uretic peptide (BNP) assay? 

DESIGN Prospective, consecutive patients referred for 
echocardiography. The echocardiographers were blinded 
to the study questions. 

SETTING University hospital echocardiography labo¬ 
ratory. 

PATIENTS Four hundred sixty-six consecutive outpa¬ 
tients referred for echocardiography who were classified 
further as either having symptoms of heart failure or risk 
factors for systolic dysfunction. Patients with known sys¬ 
tolic dysfunction, or those referred for characterization of 
a murmur in the absence of any cardiac symptoms, were 
excluded. 


Table 16-16 Useful Predictors for Systolic Disfunction Identified in a 
Multivariable Model 

LR for Systolic 


Finding 

LR for Normal Systolic 
Function (95% Cl) 

Dysfunction, EF < 
45% (95% Cl) 

Favors Normal Systolic Function 

Female sex 

1.6(1.2-2.2) 

0.62 (0.46-0.84) 

Systolic blood pressure > 
160 mm Hg 

1.8(1.3-2.6) 

0.55 (0.39-0.78) 

Favors Systolic Dysfunction 

Heart rate > 100/min 

0.43 (0.28-0.65) 

2.3 (1.5-3.5) 

Left atrial ECG abnormality 

0.42 (0.26-0.63) 

2.4(1.6-3.6) 


Abbreviations: Cl, confidence interval; ECG, electrocardiogram; EF, ejection fraction; 

LR, likelihood ratio. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large series of consecutive patients admitted 
with heart failure. 

LIMITATIONS Many of the symptoms and some of the 
physical findings could have been affected by expectation 
bias, created when the physician was aware of the diagnosis. 
Despite expectation bias, none of the clinical findings (jugu¬ 
lar venous distention, rales, third and fourth heart sounds) 
or the physician’s interpretation of the patient’s symptoms 
were independently useful for distinguishing heart failure 
patients with normal systolic function from those with sys¬ 
tolic dysfunction. 

These results support the clinical need for an objective 
measure of systolic function to distinguish diastolic dysfunc¬ 
tion (a normal or elevated EF) from systolic dysfunction. The 
few variables with independent significance for identifying 
diastolic dysfunction did not have likelihood ratios so differ¬ 
ent from 1 that they would obviate the need for echocardiog¬ 
raphy in the patient with heart failure. Hypertension 
approximately doubles the likelihood of a normal EF, 
whereas tachycardia and left atrial ECG abnormalities 
approximately double the likelihood of systolic dysfunction. 

Reviewed by Catherine R. Lucey, MD, 
and Robert Badgett, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A score assigned to each patient referred for echocardiography 
was based on 5 possible risk factors (maximum score = 5): 
(1) history of myocardial infarction, (2) previous diagnosis of 
congestive heart failure, (3) current orthopnea or paroxysmal 
nocturnal dyspnea, (4) presence of pathologic Q waves or an 
intraventricular conduction defect on ECG, or (5) cardiomeg- 
aly, pulmonary venous hypertension, or interstitial edema on 
chest radiograph. An abnormal score was 1 or higher. 

The ECG and chest radiograph results were obtained from 
the clinical record. 

A BNP level higher than 37 pg/mL was defined prospec¬ 
tively as abnormal. 

An investigator who extracted the clinical findings, but 
who was blinded to the results of the BNP and echocardio- 
graph results, reviewed the clinical record. 

Left ventricular systolic dysfunction was defined by an 
ejection fraction (EF) of less than 45%. The echocardio¬ 
graphers were unaware of the study questions. 


MAIN OUTCOME MEASURES 

Sensitivity and specificity of the clinical score and BNP alone, 
vs in combination. Likelihood ratios were calculated from 
data provided in the article. 
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MAIN RESULTS 

Of 466 patients, 201 had an abnormal score of 1 or higher. 
The prevalence of heart failure symptoms was 33%, but only 
11% of all patients had an EF less than 45%. See 
and 6-18. 


Table 16-17 Likelihood Ratios for Clinical Score and Brain Natriuretic 
Peptide 

Test 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical score >1“ 

2.5 (2.2-2.9) 

0.09 (0.03-0.28) 

BNP assay (> 37 pg/mL) 

2.2(1.8-2.6) 

0.34(0.19-0.54) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 

“Clinical score was number for: a history of myocardial infarction, previous diagnosis 
of heart failure, orthopnea or paroxysmal nocturnal dyspnea, Q-wave or intraventric¬ 
ular conduction delay on an electrocardiogram, or a chest radiograph with cardio- 
megaly, pulmonary venous hypertension, or edema. 


Table 16-18 Serial Likelihood Ratios for the Combination of Clinical 
Score and Brain Natriuretic Peptide 

Clinical Score 8 BNP 

LR (95% Cl) 

+ + 

3.9 (3.0-5.0) 

+ 

1.1 (0.61-2.0) 

- + 

0.23 (0.06-0.92) 

- 

0.04 (0.01-0.30) 


Abbreviations: BNP, brain natriuretic peptide; Cl, confidence interval; LR, likelihood ratio. 
“Clinical score was positive for a history of myocardial infarction, previous diagnosis 
of heart failure, orthopnea or paroxysmal nocturnal dyspnea, Q-wave or intraventric¬ 
ular conduction delay on an electrocardiogram, or a chest radiograph with cardio- 
megaly, pulmonary venous hypertension, or edema. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS These consecutively enrolled patients had symp¬ 
toms suggesting heart failure or else were asymptomatic but had 
risk factors for a low EF. By including asymptomatic but higher- 
risk patients, the investigators assembled a population of patients 


that reflects those for whom an outpatient physician would 
appropriately evaluate for left ventricular systolic dysfunction. 

LIMITATIONS The clinical data were determined from a 
chart review, but the person who abstracted the data was 
blinded appropriately to the BNP and echocardiograph 
results. Relatively few patients had systolic dysfunction, 
despite their heart failure symptoms or risk factors. As in 
other studies conducted in echocardiography laboratories, 
the investigators evaluated patients through the referral filter 
imposed by the clinician. Thus, patients with obvious systolic 
dysfunction or those who were obviously healthy were not 
enrolled. Although this is a pragmatic approach, the results 
might differ for physicians who have different clinical thresh¬ 
olds for referring patients for echocardiograms. 

The clinical score, by itself, is far superior to the BNP pri¬ 
marily because the clinical score is so efficient at identifying 
patients with an EF of 45% or higher. Remarkably, the clini¬ 
cal score had no physical examination data (eg, rales, a third 
heart sound, peripheral edema) and only 1 symptom (ortho¬ 
pnea or paroxysmal nocturnal dyspnea). An abnormal clini¬ 
cal score with a normal BNP result (likelihood ratio, =1) adds 
no information to the pretest likelihood. 

When patients have a normal clinical score, the probability 
of a low EF is greatly reduced. At a prior probability of 40% 
for a low EF (much higher than the prevalence in this study), 
a normal clinical score decreases the probability to approxi¬ 
mately 6%. For clinicians who are inclined to measure the EF 
at that probability level, a BNP of 37 pg/mL or lower would 
decrease the probability to 2.5% (clinicians should confirm 
their laboratory’s BNP value for healthy patients). Patients 
who start with a prior probability of less than 40% would 
have a diminishingly low likelihood of systolic dysfunction 
with a normal clinical score and normal BNP result. 

Just over one-fourth of patients with a normal score (27%) 
had an abnormal BNP result, but most of these patients 
prove to have a normal EF result (70/72). The utility of a 
BNP in patients with a normal clinical score depends entirely 
on the pretest probability and the physician’s and patient’s 
need for certainty. 

Reviewed by Robert G. Badgett, MD, 
and Catherine Lucey, MD 
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CLINICAL SCENARIOS 


CASE 1 A 65-year-old man experienced a witnessed ven¬ 
tricular fibrillation cardiac arrest at home 24 hours ago. A 
neighbor had performed cardiopulmonary resuscitation 
for 5 minutes until the paramedics arrived and performed 
successful defibrillation. His electrocardiogram revealed a 
large anterior myocardial infarction, for which he under¬ 
went urgent coronary angioplasty. Although still unre¬ 
sponsive, he withdraws from a painful stimulus and his 
pupillary and corneal reflexes are present. The family asks 
you about his chance of meaningful recovery. 

CASE 2 A 26-year-old woman presented to the emergency 
department with severe pleuritic chest pain and dyspnea. 
While waiting for a computed tomographic scan in the 
radiology department, she had an asystolic cardiac arrest. 
The resuscitation lasted 20 minutes, after which she was 
found to have reactive pupils. Three days later, the family is 
considering withdrawing care because she is still comatose. 
On examination, her pupils are now unreactive and she has 
no motor response or brainstem reflexes. The nurse reports 
that the patient had myoclonus 12 hours ago. 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


With the development of closed-chest cardiac massage in 
1960 and the creation of intensive care units shortly thereaf¬ 
ter, it became possible to survive cardiac arrest. Half a cen¬ 
tury later, cardiovascular disease is the leading cause of death 
in North America and Europe, accounting for approximately 
half of all deaths in the United States. At least 225000 people 
die annually in the United States from cardiovascular disease 
before they reach a hospital. Twice as many will have cardiac 
arrest and attempted resuscitation during hospitalization. 
Survival rates for prehospital cardiac arrest range from 2% to 
33%, and reported inpatient survival rates range between 0% 
and 29%. 1,2 Most survivors of cardiac arrest (= 80%) are 
comatose after resuscitation. After trauma and drug over¬ 
dose, cardiac arrest is now the third most common cause of 
coma. 3,4 With increasing public education in basic life sup¬ 
port and with the use of automated defibrillators in public 
places, such as in airports and shopping malls, postcardiac 
arrest coma has become a common and important clinical 
syndrome. 

With the increased success of resuscitation from cardiac 
arrest comes a multitude of medical, ethical, and economic 
questions. Once spontaneous circulation has been restored, 
recovery is far from certain. Possible outcomes range from 
complete neurologic recovery to death to the persistent vege¬ 
tative state. In admitted patients who survive the initial car- 
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diac arrest, rates of meaningful neurologic recovery range 
from 10% to 30%. 5 This uncertainty furthers the emotional 
distress of a grieving and anxious family. Accordingly, it is 
important for families and physicians to have an understand¬ 
ing of a patient’s chance of meaningful recovery. 

Unfortunately, the result of the gold standard test for prog¬ 
nosis in this population can be determined only when the 
true outcome of each patient is known, rather than at presen¬ 
tation. Recent interest has developed in the potential role of 
neurophysiologic testing. 6 ' 8 A recent systematic review found 
somatosensory-evoked potentials useful in predicting “wak¬ 
ening” of comatose patients. 7 Other research suggests that 
elevated serum levels of neuron-specific enolase may predict 
poor outcome in comatose survivors of cardiac arrest. 8 
Although these results are promising, it will take some time 
before the precise operating characteristics of these tests are 
fully understood and before the technology is widely avail¬ 
able in clinical practice. 

The physical examination has the potential to be extremely 
useful in this common clinical scenario because of its univer¬ 
sal availability and ease of performance. From a compassion¬ 
ate standpoint, the clinical evaluation yields the first 
information that is relayed to family members desperate for 
information. Thus, it is crucial for physicians to understand 
the precision and accuracy of the clinical examination in 
determining prognosis in hypoxic-ischemic coma. 

Pathophysiology 

Unlike traumatic or focal ischemic causes of coma, cardiac 
arrest presents a global ischemic insult to the brain. The 


Table 17-1 Glasgow Coma Scale 2 


Best Motor Response 


Obeying commands 

6 

Localizing to pain 

5 

Withdrawing to pain 

4 

Abnormal flexion (decorticate) 

3 

Extensor response (decerebrate) 

2 

None 

1 

Best Verbal Response 8 

Oriented 

5 

Confused conversation 

4 

Inappropriate words 

3 

Incomprehensible sounds 

2 

None 

1 

Eye Opening 

Spontaneous 

4 

To speech 

3 

To pain 

2 

None 

1 


“The score for the scale is summed across the 3 components and ranges from 3 to 
15. A lower score indicates more severe neurologic deficits. Original Glasgow Coma 
Scale in Teasdale and Jennett. 11 

“Intubated patients cannot be given a score for the verbal component, so their total 
scores accordingly range from 2 to 10. 


extent of cerebral damage is largely influenced by the dura¬ 
tion of interrupted cerebral blood flow. Accordingly, mini¬ 
mizing both the arrest (no-flow) time and cardiopulmonary 
resuscitation (low-flow) time is critical. With the return of 
spontaneous circulation comes a transient period of cerebral 
hyperemia, which is followed by vasospasm and protracted 
global and multifocal hypoperfusion. Cerebral oxygen stores 
and consciousness are lost within 20 seconds of the onset of 
cardiac arrest, whereas glucose and adenosine triphosphate 
stores are lost by 5 minutes. A cascade of complex chemical 
derangements ensues, which leads to neuronal death and cul¬ 
minates in the postcardiac arrest coma. 9 

How to Examine a Comatose Patient 

Glasgow Coma Scale 

Before 1974, the clinical assessment of coma relied on quali¬ 
tative, descriptive terminology and the presence or absence 
of brainstem reflexes. Plum and Posner 10 described the widely 
used definition of coma as “a state of unarouseable unre¬ 
sponsiveness.” In 1974, Teasdale and Jennett 11 published the 
first description of the Glasgow Coma Scale (GCS; Table 17-1), 
which has since been used worldwide as a means of classify¬ 
ing coma. Although originally described for traumatic coma, 
it is equally applicable to the assessment of nontraumatic 
coma. This ordinal scale is calculated from the sum of 3 com¬ 
ponents: motor response, verbal response, and eye opening. 
In assessment of the motor response, it is important to apply 
central pain because spinal reflexes may occur with periph¬ 
eral stimulation and do not represent a true motor response. 
A painful stimulus may be applied to the supraorbital region 
(deep pinching of the skin) or the sternum (firm twisting 
pressure applied with the examiner’s knuckles). The mini¬ 
mum GCS score is 3 and maximum is 15. 

Physical Examination Maneuvers 

In addition to the GCS, various brainstem reflexes are used in 
the physical examination of comatose patients. 10 - 12 The pupillary 
reflex involves cranial nerves II and III. Shining a penlight into 
one eye and then the other tests the patient’s pupillary light 
response; the examiner observes the direct and consensual 
response (constriction of the opposite eye). The corneal reflex 
involves cranial nerves V and VII. Touching the cornea with a 
piece of cotton or tissue should cause both eyes to blink. The gag 
and cough reflexes test cranial nerves IX and X. To elicit a gag, 
apply a tongue depressor to the posterior pharynx. The soft pal¬ 
ate should rise symmetrically. In patients who are intubated, 
assess the cough (or carinal) reflex by applying deep suction 
through the endotracheal tube to the carina. The suction will 
produce a gasp, followed by several rapid coughs. 

Vestibular signs are also commonly examined in the comatose 
patient. The oculocephalic (or “doll’s eye”) reflex involves observ¬ 
ing the patient’s eyes during passive rotation of the skull. In a 
comatose patient with intact midbrain and vestibular reflexes, 
the eyes will move in a direction opposite to that in which the 
head is moved. If this reflex is lost, the globes will remain fixed 
within the head and the eyes will continue to stare in whatever 
direction the head is pointed. This reflex should not be tested in 
cases of suspected cervical trauma. Cold water caloric testing 
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(oculovestibular reflex) also tests the vestibular and oculomotor 
systems. To perform the test, first examine the tympanic mem¬ 
brane to ensure there is no perforation or impacted cerumen. 
With the head 30 degrees higher than the horizontal, irrigate up 
to 120 mL of ice cold water into the auditory canal. In the uncon¬ 
scious patient with intact brainstem function, there will be slow 
tonic deviation of eyes toward the irrigated ear. 

It is also important to observe the presence of seizures or 
myoclonus when examining the comatose patient because some 
clinicians believe they may be useful in prognosis of comatose 
survivors of cardiac arrest. Seizures may be generalized or focal. 
Myoclonus refers to isolated sudden muscular contractions and 
may be either focal or generalized contractions of axial and limb 
musculature. In patients with seizures, the physical examination 
should be repeated after the postictal period. 

Finally, mechanically ventilated patients are frequently 
sedated or paralyzed. Accordingly, when a detailed neuro¬ 
logic examination is performed, it is crucial that these medi¬ 
cations be at least temporarily discontinued. 

Outcomes of Interest 

The neurologic outcome of comatose patients is most often 
described with the Cerebral Performance Categories (CPCs) 
1-5, as shown in Box 17-1. 13 

METHODS 

Search Strategy and Quality Review 

We conducted a computerized bibliographic search of MED¬ 
LINE and EMBASE for 1966-2003 to determine the precision 
and accuracy of components of the clinical examination in 
prognosis of hypoxic-ischemic coma. Search terms included 
“coma,” “cardiac arrest,” “prognosis,” “physical examination,” 
“sensitivity and specificity,” and “observer variation.” The search 
was conducted by using a previously published search strategy 
for The Rational Clinical Examination series. 14 We checked the 
reference lists of all review articles and primary studies for addi¬ 
tional articles that were not identified on the computerized 
search. Standard physical examination textbooks and personal 
communications with the authors of primary studies provided 
additional citations. FinaEy, we manually reviewed published 
abstracts from the annual scientific meetings of the American 
Neurological Association, the American Academy of Neurology, 
the Society of Critical Care Medicine, and the European Society 
for Intensive Care Medicine for 1997-2003. 

One of the authors (C.M.B.) initially screened the titles and 
abstracts of the search results and classified them as primary 
studies, review articles, or not relevant. Because we were inter¬ 
ested in both the precision and accuracy of the clinical examina¬ 
tion in postcardiac arrest coma, we included primary studies of 
each type. A preliminary review of the literature revealed few 
precision studies, so the inclusion criteria for this type of study 
were broadened. Precision studies were included if they assessed 
the interobserver agreement in the neurologic examination of 
comatose adult patients. We included both traumatic and non- 
traumatic forms of coma. 


Primary studies of accuracy were independendy reviewed by 2 
of us (C.M.B. and R.H.B.) and included if they assessed the accu¬ 
racy of the clinical examination in prognosis of hypoxic- 
ischemic coma in patients older than 10 years. Other criteria for 
study selection were the presentation of outcome data for indi¬ 
vidual clinical variables measured at discrete intervals. Selected 
studies also presented neurologic outcome data as defined by the 
CPCs or in such a manner that an equivalent CPC score could be 
determined (Box 17-1). Studies were excluded if they involved 
patients with coma from other medical conditions or trauma. 

According to our findings in a preliminary literature search, 
we realized there were 2 types of accuracy studies in the litera¬ 
ture. The majority of studies dichotomized patient outcome as 
good or poor. Unfortunately, there is not a uniform definition of 
what constitutes a good vs a poor outcome. Most studies com¬ 
bined outcome data for severe neurologic disability, vegetative 
state, and death (ie, CPC 3-5) as a poor outcome and normal or 
moderate disability (ie, CPC 1-2) as a good outcome. However, 
there were 6 studies that included severe neurologic disability 
(ie, CPC 3) as a good outcome, 4 of which included fewer than 
65 patients. 15 ' 20 We included studies from which combined out¬ 
come data for severe neurologic disability, vegetative state, and 
death (ie, CPC 3-5) could be extracted. We did this because 
most primary studies presented outcome data in this fashion 
and because we could not combine studies that had differing 
definitions of good vs poor outcomes. Furthermore, we thought 


Box 17-1 Glasgow-Pittsburgh Cerebral Performance Categories 3 

1. GOOD CEREBRAL PERFORMANCE 

Conscious. Alert, able to work and lead a normal life. May 
have minor psychological or neurologic deficits (mild dys¬ 
phasia, nonincapacitating hemiparesis, or minor cranial 
nerve abnormalities). 

2. MODERATE CEREBRAL DISABILITY 

Conscious. Sufficient cerebral function for part-time work in 
sheltered environment or independent activities of daily life 
(dressing, traveling by public transportation, and preparing 
food). May have hemiplegia, seizures, ataxia, dysarthria, dys¬ 
phasia, or permanent memory or mental changes. 

3. SEVERE CEREBRAL DISABILITY 

Conscious. Dependent on others for daily support because 
of impaired brain function (in an institution or at home 
with exceptional family effort). At least limited cognition. 
Includes a wide range of cerebral abnormalities, from 
ambulatory with severe memory disturbance or dementia 
precluding independent existence to paralytic and able to 
communicate only with eyes, as in the locked-in syndrome. 

4. COMA, VEGETATIVE STATE 

Not conscious. Unaware of surroundings, no cognition. No 
verbal or psychological interactions with environment. 

5. DEATH 

Certified brain dead or dead by traditional criteria. 

“Adapted from Cummings et al. 13 
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it was reasonable to assume that most clinicians, patients, and 
families would not consider severe neurologic disability (defined 
as CPC 3) a desirable outcome. 

The methodologic quality of each primary study was assessed 
in duplicate with modified criteria previously developed for The 
Rational Clinical Examination series (see Table 1-7). 21 Because 
this study assessed prognosis and not diagnosis, investigators 
were considered blinded if the study was prospective and clinical 
variables were assessed before a patient’s outcome was known. 
Level 1 studies were prospective studies with 100 or more consec¬ 
utive unselected patients. Level 2 studies were similar but involved 
fewer than 100 patients. Level 3 studies were retrospective chart 
reviews, and level 4 studies included selected (ie, nonconsecutive) 
patients. 

Statistical Methods 

Two authors (C.M.B. and R.H.B.) independently extracted data 
for analysis; we resolved disagreement by consensus. When data 
were missing or unclear, we contacted the primary investigators 
requesting further information. Published raw data were used to 
calculate positive and negative likelihood ratios (LRs) for specific 
clinical variables. To create 2x2 evidence tables, we dichotomized 
CPC 1 and 2 as good outcome and CPC 3 through 5 as poor out¬ 
come. Sensitivity was defined as the proportion of patients with a 
poor neurologic outcome who had a particular physical finding; 
specificity was the proportion of patients who had a good neuro¬ 
logic outcome and did not have the particular finding. 

When 3 or more studies examined the same clinical vari¬ 
able at the same time after cardiac arrest, we calculated sum¬ 
mary LRs and 95% confidence intervals (CIs) using bayesian 


random-effects meta-analyses. We also present the strongest 
LRs for individual clinical variables at various times after car¬ 
diac arrest. LRs were modeled using a method described by 
Warn et al 22 for relative risks, also using the prior distributions 
used therein. Posttest probabilities were computed from the 
estimated pretest probability and LRs. 23 All analyses were done 
using the WinBUGS software package (Version 1.4, 2003; 
MRC Biostatistices Unit, Cambridge, England). 24 

Likelihood Ratios 

LRs are a method of converting pretest information (ie, proba¬ 
bility, or more precisely, odds) into posttest information. 25 The 
pretest information is the probability of a poor outcome among 
all comatose survivors of cardiac arrest. The results of the clini¬ 
cal examination, reflected in the LRs for the findings, are com¬ 
bined with the pretest information to estimate the posttest 
probability of a poor outcome. Lor clinicians, the easiest way to 
interpret LRs is to keep in mind that when an abnormal clinical 
finding is present in a comatose survivor (eg, absent pupillary 
response), the likelihood of a poor outcome increases and the 
LR will be greater than 1. Similarly, if the finding does not indi¬ 
cate a poor prognosis (eg, present pupillary response), an LR of 
less than 1 will occur. 

RESULTS 

Search Results and Quality of the Evidence 

Our search yielded 5 studies of precision that met our inclu¬ 
sion criteria (Table 17-2). 26 ' 30 Two other studies of precision 


Table 17-2 Interobserver Agreement of Clinical Examination for Coma 



Source, y 

No. of Observers 

Observers’ Level of Experience 

Variable Assessed 

Agreement, k Statistic 

Braakman et al, 26 1977 

12 

Neurosurgeons and residents 

GCS motor 

0.72 


20 

Neurosurgical nurses 

GCS motor 

0.75 

Teasdale et al, 27 1978 

7 

Neurosurgeons 

GCS eye 

DR = 14% 




GCS verbal 

DR = 5.4% 




GCS motor 

DR = 11% 




Pupil response 

DR = 4.3% 

van den Berge et al, 28 1979 

6 

Neurosurgeons 

Oculocephalic response 

0.49 




Spontaneous eye movement 

0.46 




Pupil response 

0.65 

Minderhoud et al, 29 1982 

4 

Physicians 

GCS eye 

0.62 




GCS verbal 

0.59 




GCS motor 

0.68 




Pupil response 

0.79 




Oculocephalic response 

0.74 

Born et al, 30 1987 

6 

Neurosurgeons 

GCS motor 

0.65 




Brainstem score 

0.69 




Pupil response 

0.70 


6 

Other physicians 

GCS motor 

0.36 




Brainstem reflexes 

0.42 


Abbreviations: DR, reported disagreement rate; GCS, Glasgow Coma Scale. 
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Table 17-3 Studies on the Accuracy of the Clinical Examination in Prognosis of Hypoxic-Ischemic Coma 3 



Level of 


Site of 

Mean 

No of 

Neurologic 

Outcomes 4 

Outcome 

Source, y 

Evidence 

Study Population 

Arrest 

Age, y b 

Patients 

Good 

Poor 

Assessment 

Berek et al, 33 1997 

2 

Postcardiac arrest coma 

PH 

68 

42 

13 

29 

At discharge 

Chen et al, 34 1996 

4 

Patients in hypoxic-ischemic coma at 24 h 

PH or IH 

58 

34 

7 

27 

CO 

3 

o 

Earnest et al, 35 1979 

1 

Postcardiac arrest coma 

PH 

62 

100 

30 

70 

At discharge 

Edgren et al, 36 1987 

4 

Postcardiac arrest coma 

PH or IH 

71 

32 

11 

21 

6 mo 4 

Edgren etal, 37 1994“ 

1 

Postcardiac arrest coma 

PH or IH 

58 

262 

89 

173 

12 mo d 

Krumholz et al, 38 1988 

1 

Patients in postcardiac arrest coma at 24 h 

PH or IH 

67 

114 

21 

93 

At discharge 

Levy et al, 39 1985 

1 

Hypoxic-ischemic coma 

PH or IH 

61 

210 

26 

184 

12 mo d 

Madl et al, 40 2000 

1 

Patients in postcardiac arrest coma at 24 h 

PH or IH 

57 

209 

49 

160 

6 mcf 1 

Madl et al, 41 1993 

2 

Postcardiac arrest coma 

PH or IH 

58 

66 

17 

49 

At discharge 4 

Sasser, 42 1999 e 

1 

Patients in postcardiac arrest coma at 12 h 

PH or IH 

63 

937 

230 

707 

CD 

3 

o 

Snyder et al, 4345 1980-1981 

2 

Postcardiac arrest coma 

PH 

64 

63 

25 

38 

6 mo 4 

Widjiks et al, 46 1994 

3 

Postcardiac arrest coma 

PH 

63 

107 

15 

92 

6 mo 


Abbreviations: IH, in-hospital cardiac arrest; PH, prehospital cardiac arrest. 

“The 14 sources represent 11 studies. 

“When the mean age was not provided, the median age of the study population is listed. 

“Good neurologic outcome refers to cerebral performance categories (CPCs) 1 and 2. Poor outcome includes CPCs 3 through 5. See Box 17-1 for a definition of CPCs. 
“Outcome refers to best ever CPC in specified period. 

“This article includes patients from the first Brain Resuscitation Clinical Trial (BRCT), also included in Sasser's 42 dissertation, which involves all 3 BRCTs. 


were excluded because neither rates of agreement (k) nor raw 
data were presented. 31,32 Fourteen accuracy articles describing 
11 studies met our inclusion criteria (Table 17-3). 33-46 We had 
100% agreement on the inclusion of studies for the systematic 
review. Reasons for excluding relevant studies included studies 
that did not present neurologic outcomes as CPC 1 and 2 as a 
good outcome and CPC 3-5 as a poor outcome, 15-20 studies in 
which patients were not comatose, 47-52 studies that included 
only patients in a persistent vegetative state, 53,54 studies that 
included other forms of medical coma, 55,56 and studies that 
presented the same data set. 3,57 One study was a systematic 
review of clinical and neurophysiologic variables. 6 

We reached 100% agreement on the methodologic quality 
scores. Of the 11 accuracy studies, 5 were classified as level 1, 
3 as level 2, 1 as level 3, and 2 as level 4. The studies and 
methodologic quality scores are summarized in Table 17-3. 

Precision of the Clinical Examination of Coma 

Five studies have reported the precision of the examination 
of comatose patients (Table 17-2). Heterogeneity in study 
methodology, patient population, and variables assessed pre¬ 
cluded a quantitative synthesis of results; thus, these studies 
were reviewed qualitatively. As presented in Table 17-2, inter¬ 
observer agreement was moderate to substantial in each of 
the studies. Three studies found no difference in interob¬ 
server agreement among experienced nurses, residents, and 
physicians. 26-28 One study did find precision to be diminished 
in groups of less experienced examiners. 30 No study exam¬ 
ined only patients with nontraumatic causes of coma. In 
summary, there was reasonable consistency among studies, 
and the precision of the clinical examination of coma 


(including components of the GCS and brainstem reflexes) 
has been found to be moderate to substantial. 

Accuracy of the Clinical Examination of Coma 

Fourteen articles involving 11 studies of the accuracy of the 
clinical examination were included (Table 17-3). These stud¬ 
ies provided a sample size of 1914 comatose survivors of car¬ 
diac arrest. The proportion of individuals dying or having a 
poor neurologic outcome was calculated by pooling the out¬ 
come data from the 11 studies and was used as an estimate of 
the pretest probability of poor outcome (Table 17-3). The 
random-effects estimate of poor outcome was 77% (95% Cl, 
72%-80%). This value represents an estimate of the pretest 
probability of death or a poor outcome for the entire popula¬ 
tion of comatose survivors of cardiac arrest, and it is com¬ 
bined with the LRs for various clinical findings to revise the 
estimated probability of a poor clinical outcome. 

Motor Response and Brainstem Reflexes 

Six studies examined the association between motor and 
brainstem function and the recovery of comatose survivors of 
cardiac arrest. Data for specific clinical findings were pooled if 
they were assessed in at least 3 studies. Table 17-4 shows poten¬ 
tially useful clinical findings from individual studies. Summary 
measures for pooled variables are shown in Table 17-5. 

In 1987, Edgren et al 36 reported motor and brainstem func¬ 
tion in 32 comatose patients at 24 and 48 hours after cardiac 
arrest. Patients were weaned from intensive care at 72 hours 
if they did not respond to pain and had no evidence of brain¬ 
stem reflexes. Chen et al 34 examined similar clinical variables 
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Table 17-4 Useful Clinical Findings in the Prognosis of Postcardiac Arrest Coma Organized by Time After Onset of Coma (Not Pooled) 3 

LR of Poor Neurologic Outcome (95% Cl) b 


Clinical Finding Study Positive Negative 


At Onset of Coma 

Absent pupillary reflex 

Earnest et al 35 

7.2(1.9-28.0) 

0.5 (0.4-0.6) 

Absent motor response 

Levy et al 39 

3.5(1.4-8.6) 

0.6 (0.4-0.7) 

Absent corneal reflex 

Levy et al 39 

3.2 (1.1-9.5) 

0.7 (0.6-0.8) 

Absent oculocephalic reflex 

Earnest et al 35 

2.5 (1.3-4.8) 

0.4 (0.3-0.6) 

Absent spontaneous eye movement 

Levy et al 39 

2.2(1.3-4.0) 

0.4 (0.3-0.6) 

ICS <4 

Berek et al 33 

2.2 (1.1-4.5) 

0.2 (0.1-0.6) 

GCS < 5 

Madl et al 40 

1.4 (1.1-1.6) 

0.3 (0.2-0.5) 

Absent verbal effort 

Levy et al 39 

1.2 (0.9-1.6) 

0.1 (0.0-0.7) 

At 12 h 

Absent cough reflex 

Sasser 42 

13.4(4.4-40.3) 

0.3 (0.2-0.4) 

Absent corneal reflex 

Sasser 42 

9.1 (3.9-21.1) 

0.3 (0.2-0.4) 

Absent gag reflex 

Sasser 42 

8.7(4.0-18.9) 

0.4 (0.4-0.5) 

Absent pupillary reflex 

Sasser 42 

4.0 (2.5-6.6) 

0.5 (0.5-0.6) 

GCS <5 

Sasser 42 

3.5 (2.4-5.2) 

0.4 (0.3-0.4) 

Absent motor response 

Sasser 42 

3.2 (2.2-4.6) 

0.4 (0.3-0.5) 

Absent withdrawal to pain 

Sasser 42 

2.4 (1.9-3.1) 

0.2 (0.1-0.2) 

Absent verbal effort 

Sasser 42 

1.6 (1.4-1.9) 

0.1 (0.0-0.1) 

At 24 h 

Absent cough reflex 

Sasser 42 

84.6(5.3-1342.0) 

0.4 (0.3-0.5) 

Absent gag reflex 

Sasser 42 

24.9 (6.3-98.3) 

0.5 (0.4-0.5) 

GCS <5 

Sasser 42 

8.8 (5.1-15.1) 

0.4 (0.3-0.4) 

Absent eye opening to pain 

Sasser 42 

5.9 (3.9-9.0) 

0.3 (0.3-0.4) 

Absent spontaneous eye movement 

Levy et al 39 

3.5(1.4-8.8) 

0.5 (0.4-0.7) 

Absent eye opening to pain 

Levy et al 39 

3.0 (1.5-6.2) 

0.4 (0.3-0.5) 

Absent oculocephalic reflex 

Sasser 42 

2.9 (1.8-4.6) 

0.5 (0.5-0.6) 

Absent spontaneous eye movement 

Sasser 42 

2.7 (2.1-3.4) 

0.3 (0.2-0.3) 

Absent verbal effort 

Sasser 42 

2.4 (2.0-2.9) 

0.1 (0.0-0.1) 

At 48 h 

GCS <6 

Madl et al 41 

2.8 (1.3-5.9) 

0.3 (0.1-0.5) 

GCS <10 

Madl et al 41 

1.3 (1.0-1.7) 

0.0 (0.0-0.7) 

At 72 h 

Absent withdrawal to pain 

Levy et al 39 

36.5 (2.3-569.9) 

0.3 (0.2-0.4) 

Absent spontaneous eye movement 

Levy et al 39 

11.5 (1.7-79.0) 

0.6 (0.5-0.7) 

Absent verbal effort 

Levy et al 39 

7.4 (2.0-28.0) 

0.3 (0.2-0.5) 

Absent eye opening to pain 

Levy et al 39 

6.9(1.8-27.0) 

0.5 (0.4-0.6) 

At 7 d 

Absent withdrawal to pain 

Levy et al 39 

29.7 (1.9-466.0) 

0.4 (0.3-0.6) 

Absent verbal effort 

Levy et al 39 

14.1 (2.0-97.7) 

0.4 (0.2-0.6) 

Abbreviations: Cl, confidence interval; GCS, Glasgow Come Scale; ICS, Innsbruck Coma Scale 33 ; LR, likelihood ratio. 

“Clinical findings that have a positive LR greater than 2 and lower Cl boundary greater than 1 are presented with the corresponding negative LR. 

"The positive LR indicates that the abnormal clinical finding shown in the left column was present. The negative LR indicates that the patient had a normal result for the clinical 
finding; thus, the negative LR in the first row is the value associated with the presence of normal pupillary reflexes. 


in a study of 34 comatose patients. As in the study by Edgren 
et al, 36 patients with absent brainstem reflexes at 24 hours 
were excluded from this study. 

The Brain Resuscitation Clinical Trials (BRCTs) were a 
series of 3 large prospective, randomized, multicenter studies 


of pharmacologic interventions in cardiac arrest. In BRCT I 58 
(1979-1984), 262 comatose survivors of cardiac arrest were 
assessed for the use of a barbiturate (thiopental). In BRCT 
II 59 (1984-1989), 516 comatose patients were randomly 
assigned to receive placebo or a calcium-channel blocker 
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(lidoflazine) after cardiac arrest. In BRCT III 60 (1989-1992), 
2915 patients were randomly assigned to receive standard- or 
high-dose epinephrine during cardiac arrest. All 3 BRCT 
studies reported negative results; there was no difference 
found in survival or neurologic outcome among treatment 
groups. Two articles described the association between clini¬ 
cal neurologic signs and outcome in the BRCT study popu¬ 
lation. In 1994, Edgren et al 37 reported the neurologic 
examination and outcomes of the 109 individuals in BRCT I 
who had survived to 72 hours. In an analysis of all 3 BRCT 
studies, Sasser 42 assessed the prognostic utility of motor 
response and brainstem reflexes at 12 and 24 hours after car¬ 
diac arrest. As in all studies of cardiac arrest, there was a high 
degree of early mortality. Accordingly, only 1450 patients of 
the original 3693 studied in all 3 BRCTs survived to 12 hours. 
Of this group, 506 patients were sedated or anesthetized at 
the neurologic examination and therefore were not included 
in Sasser’s 42 review. Of the remaining 944 patients, outcome 
data were available for 937. This is the largest population of 
comatose survivors of cardiac arrest reported to date. 

Summary measures for clinical variables that were assessed 
in at least 3 studies are presented in Table 17-5. Five pooled 
variables were found to have a 95% Cl lying entirely above 1. 
The clinical signs at 24 hours with the highest LRs were 
absent corneal reflexes (LR, 13; 95% Cl, 2.0-69), absent 
pupillary reflexes (LR, 10; 95% Cl, 1.8-49), absent motor 
response (LR, 4.9; 95% Cl, 1.6-13), and absent withdrawal to 
pain (LR, 4.7; 95% Cl, 2.2-9.8). At 72 hours after cardiac 
arrest, absent motor response was found to accurately pre¬ 
dict death or poor neurologic outcome (LR, 9.2; 95% Cl, 2.1- 
49). No clinical findings were found to accurately predict 
good neurologic outcome (ie, no useful negative LRs). 

Coma Scales 

Four studies assessed composite coma scores as prognostic 
indicators in postcardiac arrest coma. Madl et al 41 reported 2 
studies that assessed the role of the GCS in predicting neuro¬ 
logic recovery. In 1993, this group reported on a series of 66 
comatose patients who survived cardiac arrest. 41 The GCS at 48 
hours was compared with survival and functional recovery. A 
second study of 209 patients measured GCS on admission to 
the intensive care unit after cardiac arrest. 40 In the BRCT 
reports, GCS scores at 12, 24, and 72 hours were compared 
with neurologic recovery. 37,42 In 1997, Berek et al 33 examined 
the utility of the Innsbruck Coma Scale (ICS) in 42 comatose 
patients who survived prehospital arrest. The ICS includes an 
assessment of the GCS components in addition to various 
brainstem reflexes. A score from 0 to 23 is assigned. A lower 
score indicates more severe neurologic deficits. 

Although the composite coma scores did predict poor neu¬ 
rologic outcome, they were not as predictive as the individual 
motor and brainstem reflex components. This is demon¬ 
strated in Table 17-4. 

Seizures 

Four studies have examined whether seizures in the post¬ 
arrest period accurately predict outcome. In 1988, Krumholz 
et al 38 described 114 comatose survivors of cardiac arrest. 


Table 17-5 Pooled Clinical Signs in the Prognosis of Postcardiac 
Arrest Coma 

LR of Poor Neurologic Outcome (95% Cl) 


Source Positive Negative 


At Time of Coma Onset 3 

Absent withdrawal to pain 

Summary LR 

1.7(07-4.2) 

0.4 (0.1-1.1) 

Earnest et al 35 

3.7 (1.6-8.2) 

0.4 (0.3-0.6) 

Levy et al 39 

1.4 (1.0-1.9) 

0.4 (0.2-07) 

Snyder et al 43 

1.4 (0.9-2.1) 

0.5 (0.2-1.2) 

At 24 h 

Absent withdrawal to pain 

Summary LR 

4.7 (2.2-9.8) 

0.2 (0.1-0.6) 

Edgren et al 36 

3.9(1.1-14) 

0.4 (0.2-0.8) 

Levy et al 39 

6.8 (2.3-20) 

0.2 (0.2-0.3) 

Sasser 42 

5.1 (3.6-7.3) 

0.2 (0.1-0.2) 

Snyder et al 43 

6.5(1.0-42) 

0.3(0.1-07) 

Absent pupil response 

Summary LR 

10(1.8-49) 

0.8 (0.4-1.4) 

Chen et al 34 

0.9(0.0-19) 

1.0 (0.8-1.2) 

Edgren et al 36 

5.6 (0.3-95) 

0.8 (0.6-1.1) 

Levy et al 39 

11 (0.7-170) 

0.8 (07-0.9) 

Sasser 42 

39 (5.6-277) 

0.6 (0.6-07) 

Absent motor response 

Summary LR 

4.9 (1.6-13) 

0.6 (0.3-1.3) 

Chen et al 34 

3.7 (0.2-59) 

0.8 (0.6-1.1) 

Levy et al 39 

5.5(1.4-21) 

0.6 (0.5-0.8) 

Sasser 42 

7.6(4.6-13) 

0.4 (0.3-0.4) 

Snyder et al 43 

3.5 (0.5-24) 

0.7 (0.5-1.1) 

Absent corneal reflex 

Summary LR 

13(2.0-69) 

0.6 (0.2-1.9) 

Edgren et al 36 

1.8(0.2-15.4) 

0.9(07-1.2) 

Levy et al 39 

15(0.9-233) 

0.7 (07-0.8) 

Sasser 42 

91 (5.7-1443) 

0.4 (0.4-0.5) 

At 72 h 

Absent pupil response 

Summary LR 

3.4 (0.5-24) 

0.9 (0.4-2.1) 

Chen et al 34 

0.9(0.0-19) 

1.0 (0.8-1.2) 

Edgren et al 37 

5.3 (0.3-84) 

0.8(07-1.0) 

Levy et al 39 

5.8 (0.4-94) 

0.9 (0.8-1.0) 

Absent motor response 

Summary LR 

9.2 (2.1-49) 

07(0.3-1.3) 

Chen et al 34 

2.0(0.1-35) 

0.9(07-1.2) 

Edgren et al 37 

13(0.8-193) 

0.6 (0.5-07) 

Levy et al 39 

16(1.1-261) 

0.7 (0.6-0.8) 

Snyder et al 43 

3.0 (0.2-39) 

0.6 (0.3-1.1) 

Seizure or myoclonus" 

Summary LR 

1.4 (0.5-3.9) 

0.8 (0.3-2.1) 

Krumholz et al 38 

17(0.8-3.4) 

07(0.5-1.0) 

Levy et al 39 

1.1 (0.5-2.3) 

1.0 (0.8-1.2) 

Snyder et al 44 

1.7 (07-4.2) 

0.8 (0.6-1.1) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

"Times reflect number of hours since cardiac arrest. 

"These figures refer to the presence of seizures or myoclonus at any time after cardiac arrest. 
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Nearly half (44%) of the patients had some seizure activity. 
In a study conducted by Snyder et al 44 on 63 patients, 19 
(30%) had seizures or myoclonus. In 1994, Widjiks et al 46 
described the prevalence of myoclonus status in a group of 
107 patients. Forty (37%) of 107 patients had myoclonus sta¬ 
tus within 24 hours. In the study conducted by Levy et al 39 on 
210 patients, 53 (25%) had seizure or myoclonic activity. 
Most clinicians infer that seizures portend a poor prognosis 
in comatose survivors of cardiac arrest. However, none of the 
individual studies or the summary measures established that 
seizures accurately predict outcome (Table 17-5). 


CLINICAL SCENARIOS—RESOLUTIONS 


In both cases, an estimate of the pretest probability (derived 
from our overall study population) of poor neurologic out¬ 
come is 77%. This figure will vary according to comorbid 
disease, duration of cardiopulmonary resuscitation, and 
other clinical variables. The 65-year-old man who with¬ 
draws to pain and has intact brainstem reflexes 24 hours 
after cardiac arrest has none of the clinical findings associ¬ 
ated with poor neurologic outcome. In discussing this with 
the family, it is important to explain that although there are 
no signs suggestive of poor outcome, the physical examina¬ 
tion is much less useful in predicting good outcome. Con¬ 
sequently, his probability of poor neurologic outcome 
remains unchanged (ie, 77%). 

In the second case, the young woman has no brainstem 
reflexes or response to painful stimuli at 3 days. Unfortu¬ 
nately, these findings suggest an extremely poor chance of 
meaningful neurologic recovery. The most powerful of these 
indicators elevates her posttest probability of poor neuro¬ 
logic outcome to 97%. Although the existing literature does 
not examine the combined effects of different physical 
examination predictors, because she has multiple poor 
prognostic findings her prognosis may be even worse. You 
should recognize that the observation of reactive pupils 
immediately after cardiac arrest and the presence of myoclo¬ 
nus are not useful in determining her neurologic prognosis. 


THE BOTTOM LINE 

In this systematic review we found that the precision of the 
neurologic examination in comatose patients is moderate to 
substantial. According to our results, we suggest that in 
patients who lack pupillary and corneal reflexes at 24 hours 
and have no motor response at 72 hours, the chance of 
meaningful neurologic recovery is small. This meta-analysis 
includes almost 2000 patients and is the largest such review 
to date. In addition to providing other information, it cor¬ 
roborates the findings of the oft-quoted study by Levy et al, 39 
in which none of the 210 patients who had any of these 3 
clinical findings ever regained an independent lifestyle. 

In our study population, the random-effects estimate of 
poor outcome was 77% (95% Cl, 72%-80%). The highest LR 
increases the pretest probability of 77% to a posttest proba¬ 
bility of 97% (95% Cl, 87%-100%). Immediately after car¬ 


diac arrest, no clinical signs accurately predict the patient’s 
outcome. Finally, no clinical findings were found to have LRs 
that strongly predicted good neurologic outcome. 

The results of our meta-analysis should be interpreted in the 
context of study limitations. To calculate LRs from 2x2 tables, 
there must be a delineation between what constitutes a good vs 
a poor neurologic outcome. We chose to define poor outcome 
as death, vegetative state, or severe neurologic impairment 
(precluding independent living). We made this decision 
because that is where most primary studies dichotomize out¬ 
come. Furthermore, we believe most patients, families, and 
physicians would not consider severe neurologic impairment 
a desirable outcome. However, in applying the results of this 
study to individual patients, physicians must realize that some 
families and patients may have different perceptions of what 
constitutes an acceptable neurologic outcome. It was not the 
purpose of this study to provide an ethical framework for 
treatment decisions in the management of comatose survivors 
of cardiac arrest; rather, we attempted to summarize the exist¬ 
ing literature to provide guidance to clinicians and families 
about prognostic probabilities. 

Any study of prognosis in the critically ill is potentially 
influenced by the tendency for poor prognoses to be self- 
fulfilling. It is difficult to determine whether poor neurologic 
outcomes are caused by decisions to withdraw or withhold 
therapy according to a perceived poor neurologic prognosis. 
This has the potential to artificially elevate positive LRs. 
Although there is no empirical evidence that this occurred in 
our study population, this clinical reality does remain a limi¬ 
tation of the existing literature. 

It would be potentially useful to assess whether combina¬ 
tions of neurologic findings could improve the accuracy of 
prognosis in comatose survivors of cardiac arrest. Unfortu¬ 
nately, we were unable to perform this analysis because the 
available literature does not provide these data. In 3 studies, 
combinations of findings were assessed. In the analysis of 262 
patients by Edgren et al, 37 no combination of findings was 
found to be more predictive than the individual variables. 
Sasser, 42 who performed a detailed analysis of combined neu¬ 
rologic findings and demographic, comorbidity, and cardio¬ 
pulmonary resuscitation variables, did not find any 
additional predictive value of the algorithm (sensitivity, 59%; 
specificity, 93%). Only Levy et al 39 found practical and useful 
algorithms that combined various neurologic findings. These 
are clearly presented in their article. 

Finally, the 11 studies included in this meta-analysis repre¬ 
sent a diverse and heterogeneous population with various 
comorbidities. For example, it is unclear what effect individ¬ 
ual medications or hypothermic cooling may have on the 
bedside clinical examination. Consequently, the applicability 
of our results to individual patients must be made with cau¬ 
tion and as part of the larger clinical picture. We do not sug¬ 
gest a direct extension of our results to the decision to 
proceed with or withdraw from medical care. Rather, we 
present information that we hope will allow the decision to 
be made on a more rational basis. 

In summary, simple physical examination maneuvers 
strongly predict death or poor neurologic outcome in coma- 
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tose survivors of cardiac arrest. Although decisions to pro¬ 
ceed with care or withdraw care may take place at later times 
for a variety of reasons, the most useful signs occur after at 
least 24 hours and in the case of motor response at 72 hours 
postcardiac arrest. The existing literature does not allow for 
an earlier prognosis to be made on the basis of the clinical 
examination alone. 
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CLINICAL SCENARIO 


A 26-year-old woman presented to the emergency 
department with severe pleuritic chest pain and dysp¬ 
nea. While waiting for a computed tomographic scan in 
the radiology department, she had an asystolic cardiac 
arrest. The resuscitation lasted 20 minutes, after which 
she had reactive pupils. You have been asked to see her 3 
days later for prognosis because the family is consider¬ 
ing withdrawing care. On examination, her pupils are 
now unreactive, and she has no motor response or 
brainstem reflexes. The nurse tells you she had myoclo¬ 
nus 12 hours ago. 


ORIGINAL REVIEW 

Booth CM, Boone RH, Tomlinson G, Detsky AS. Is this 
patient dead, vegetative, or severely neurologically impaired? 
assessing outcome for comatose survivors of cardiac arrest. 
JAMA. 2004;291(7):870-879. 


UPDATED LITERATURE SEARCH 

We reviewed 46 citations identified, using the same search 
strategy used in the original article. From 2003 to January 
2006, we found no additional articles on the accuracy of 
physical examination for predicting outcome of comatose 
survivors of cardiac arrest. 


CLINICAL SCENARIO—RESOLUTION 


Three days after resuscitation, she has no pupillary, motor, or 
brainstem response. Myoclonus has been observed. These are 
poor prognostic signs, with the lack of motor response con¬ 
ferring a likelihood ratio of 9.2 for a poor response. 

See next page for the “Make the Diagnosis” section. 


REFERENCE FOR THE UPDATE 

1. Cummings RO, Chamberlain DA, Abramson NS, et al. Recommended 
guidelines for uniform reporting of data from out-of-hospital cardiac 
arrest: the Utstein style. Circulation. 1991;84(2):960-975. 
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OUTCOME FOR COMATOSE SURVIVORS OF CARDIAC ARREST— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

A poor neurologic outcome (severe neurologic disability, veg¬ 
etative state, or death) occurs in 77% of victims after a non- 
traumatic cardiac arrest. 

ASSESSING THE LIKELIHOOD OF A POOR OUTCOME 

See Table 17-6. 


Table 17-6 Likelihood Ratios of Signs That Predict Poor Prognosis 

Change Over Time 

LR+ (95% Cl), 

LR- (95% Cl), 


Finding Absent 

Finding Present 

Examination at 24 h 

Corneal reflex 

13(2.0-69) 

0.6 (0.2-1.9) 

Pupillary response 

10(1.8-49) 

0.8 (0.4-1.4) 

Any motor response to pain 

4.9 (1.6-13) 

0.6 (0.3-1.3) 

Withdrawal to pain 

4.7 (2.2-9.8) 

0.2 (0.1-0.6) 

Examination at 72 h 

Any motor response to pain 

9.2 (2.1-49) 

0.7 (0.3-1.3) 

Pupillary response 

3.4 (0.5-24) 

0.9 (0.4-2.1) 

Seizure or myoclonus at any 
time after the cardiac arrest 

1.4 (0.5-3.9) 

0.8 (0.3-2.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


• Individual examination findings work better than 
scales in predicting the likelihood of death or a poor 
neurologic outcome. 

• The examination results at 24 hours and then at 72 
hours are more important than the findings immediately 
after resuscitation. 

• The presence of normal findings does not guarantee a 
good outcome. 

• Seizures at 72 hours have minimal effect on predicting 
the outcome. 

REFERENCE STANDARD TESTS 

Although death can be defined with traditional biological 
criteria, there are also cultural and legal definitions of 
death. A patient who is unaware of his or her surroundings 
and who has no cognition of or verbal or psychological 
interaction with the environment characterizes a comatose 
or vegetative state. No existing tests for recent postcardiac 
arrest serve as a reference standard for predicting the clini¬ 
cal outcome. When decisions about coma or vegetative 
states are required, clinicians must often resort to panels of 
experts to agree on the patient’s condition. Other catego¬ 
ries of outcomes are described in the Glasgow-Pittsburgh 
Cerebral Performance Categories. 1 
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Deep vein thrombosis (DVT) affects approximately 2 
million US individuals per year 1 and is the third most 
common cardiovascular disease, behind acute coronary 
syndromes and stroke. 2 Venous thromboembolism repre¬ 
sents a single disease entity, with 2 patterns of clinical pre¬ 
sentation: DVT and pulmonary embolism (PE). The 
approach to patients who present with suspected DVT is 
problematic for several reasons. If left untreated, affected 
patients can experience fatal PE. The clinical diagnosis of 
DVT is unreliable when used in isolation without objec¬ 
tive testing. 3,4 About three-quarters of the patients who 
present with suspected DVT have nonthrombotic causes 
of leg pain. 5,6 Finally, although anticoagulant therapy is 
highly effective in preventing the extension, embolization, 
and recurrence of DVT, it is associated with an increased 
risk of major bleeding (approximately 5%) and other 
potentially serious consequences such as heparin-induced 
thrombocytopenia (approximately 1%). 7 Therefore, when 
possible, anticoagulation should be restricted to those 
with confirmed DVT. For all of these reasons, it is impor¬ 
tant to diagnose DVT accurately. This will allow adminis¬ 
tration of appropriate therapy for patients with documented 
DVT; for patients without DVT, it will prevent unneces¬ 
sary exposure of patients to the hazards of anticoagulant 
therapy and prevent many from being falsely labeled as 
having venous thromboembolic disease. 

The low specificity of clinical symptoms and signs 
means that most symptomatic patients will not have DVT. 
Of those symptomatic patients with confirmed DVT at 
presentation, which represents about one-fourth of 
patients who are investigated, 6,8 approximately 80% have 
proximal DVT (popliteal or more proximal veins) and 
20% have DVT that is limited to the calf. 9 The clinical sig¬ 
nificance of proximal DVT is different from that of calf 
vein thrombosis because proximal vein thrombosis is 
associated with a higher incidence of PE. Pulmonary 
embolism is detected in approximately 50% of patients 
with documented proximal DVT. 10 Therefore, proximal 
DVT should be identified and anticoagulant treatment 
should be initiated immediately in affected patients. The 
initiation of appropriate treatment reduces the risk of 
developing recurrent DVT to about 5% and reduces the 
incidence of fatal PE to less than 1%. 1,U On the other 
hand, calf vein thrombosis rarely causes PE unless it first 
extends into the proximal veins. Proximal extension of calf 
DVT occurs in approximately 30%, with propagation 
occurring within 1 to 2 weeks of initial presentation. 6 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 


227 




CHAPTER 18 The Rational Clinical Examination 


CLINICAL SCENARIO 


A 55-year-old woman is referred to you with suspected DVT. 
She complains of pain, swelling, warmth, and redness of her 
right calf. She denies injury to the leg or previous DVT. She 
has been receiving intravenous combination chemotherapy 
for ovarian carcinoma that was diagnosed 6 months earlier. 
Extensive pelvic lymph node involvement, especially on the 
right side, was present at diagnosis, and you consider the pos¬ 
sibility that her leg symptoms are due to extrinsic compres¬ 
sion of the right iliac vein. However, no lymph nodes are 
palpable and a recent pelvic ultrasonographic examination 
showed a reduction in the previously demonstrated adenopa¬ 
thy. On physical examination, you find pitting edema, ery¬ 
thema, increased warmth of the right calf (diameter 3.5 cm 
greater than that of the left calf), and tenderness with palpa¬ 
tion of the popliteal vein. You apply a clinical prediction rule 6 
and conclude that the probability of proximal DVT is high. 

METHODS 

Search Strategy 

We conducted a MEDLINE search to retrieve all relevant articles 
pertaining to the clinical assessment of patients with suspected 
DVT. MEDLINE was searched from 1966 to April 1997 using 
Medical Subject Headings, EXP (explode) “thrombosis” (tw 
[textword]) and (EXP “physical examination” or EXP “diagnos¬ 
tic tests” or EXP “sensitivity and specificity”) and EXP “phle¬ 
bography.” This was limited to human and English-language 
studies. One hundred fifteen articles were retrieved (available on 
request from the senior author); 68 articles that dealt with the 
diagnosis of DVT were selected for complete review. The bibli¬ 
ographies of the retrieved articles were examined for additional 
relevant articles. Only 5 studies provided information on the 
relationship between clinical findings and venographic confir¬ 
mation of dVT. 3A6 - 12 ' 13 These studies were graded according to 
their methodologic quality with a standard scoring system. 14 
(See Table 1-7 for a summary ofEvidence Grades and Levels.) 

Principles of Diagnosis of DVT 

The diagnostic assessment of patients with suspected DVT 
has evolved from reliance on clinical symptoms and signs 
alone to proof of DVT from objective diagnostic tests. 


Table 18-1 Odds Ratios of Risk Factors for Deep Vein Thrombosis 3 

Risk Factors 

OR (95% Cl) 

Male sex 

1.7 (1.4-2.0) 

Age > 60 y 

1.6 (1.3-1.9) 

Cancer 

2.4 (1.9-2.8) 

Heart failure 

1.8 (1.3-2.3) 

Systemic lupus erythematosus 

4.4 (3.1-5.5) 

Lower limb arteriopathy 

1.9(1.3-2.5) 

Abbreviations: Cl, confidence interval; OR, odds ratio. 

“Data are from Cogo et al. 18 


RESULTS 

Clinical Assessment 

Throughout the past 30 years, the clinical assessment in 
patients with suspected DVT has been refined and now 
includes a careful review of risk factors, symptoms, and 
physical signs. 5,1517 Risk factors for DVT include immobil¬ 
ity, paralysis, recent surgery or trauma, malignancy, cancer 
chemotherapy, advancing age (ie, >60 years), family history 
of venous thromboembolism, pregnancy, and estrogen 
use. 18 In a recent prospective cohort study, 426 consecutive 
outpatients referred by general practitioners to a tertiary- 
care thrombosis unit were assessed for DVT risk factors, 
and in approximately half of the patients with confirmed 
DVT, a major risk factor (immobility, trauma, or recent 
surgery) was present. 18 The odds ratios (ORs) for other risk 
factors independently associated with the presence of DVT, 
including male sex, age greater than 60 years, cancer, heart 
failure, systemic lupus erythematosus, and lower limb arte- 
riopathy, are presented in Table 18-1. Commonly reported 
symptoms in patients with suspected DVT include leg pain, 
swelling, and other signs, such as pitting edema, warmth, 
dilated superficial veins, and erythema. 3 5 Unfortunately, 
these findings are neither sensitive nor specific for DVT and 
may be caused by other disease processes, 5,15 such as leg 
trauma, cellulitis, obstructive lymphadenopathy, superficial 
venous thrombosis, postphlebitic syndrome, or Baker 
cysts. 6,19 The ORs for these factors range from 1.6 to 4.3. 18 
Furthermore, DVT can coexist with each of these processes. 
For example, the finding of a Baker cyst on an ultrasono¬ 
graphic examination does not rule out the presence of 
DVT. 19 

Traditionally, the routine physical examination in patients 
with suspected DVT included a careful inspection of the 
leg, measurement of the leg circumference, and elicitation 
of Homans sign, 20 which refers to the development of pain 
in the calf or popliteal region on forceful and abrupt dorsi- 
flexion of the ankle while the knee is flexed. Early studies 
evaluating the properties of individual physical signs such 
as these to diagnose DVT showed that they were inaccu¬ 
rate. 3,4 In a study by O’Donnell et al, 3 102 patients with sus¬ 
pected DVT who presented to the outpatient departments 
of 2 tertiary-care hospitals underwent a clinical assessment 
and venography. A combination of clinical signs and symp¬ 
toms that included tenderness, swelling, redness, and the 
assessment of Homans sign could not adequately differenti¬ 
ate patients with or without DVT. The sensitivity of the 
clinical examination in this study was 88% (95% confi¬ 
dence interval [Cl], 77%-97%) and the specificity was only 
30% (95% Cl, 18%-40%). Haeger 4 conducted a prospective 
study of 72 outpatients in a thrombosis clinic who were 
examined by 1 or 2 experienced surgeons and who then 
underwent venography. No differences in the presenting 
symptoms or physical signs were identified between those 
with or without venographically confirmed DVT. The sen¬ 
sitivity of the clinical examination in this study was 66% 
(95% Cl, 50%-82%) and the specificity only 53% (95% Cl, 
38%-69%). In a study by Molloy et al, 12 100 patients with a 
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Table 18-2 Frequenc 

Signs and Symptoms 

y of Symptoms and Signs in Patients With Suspected Deep Vein Thrombosis (DVT) a 

Source 


0’Donell et al, 

, 3 Grade A, % 

Hager, 4 Grade B, % 

Molloy et al,' 

12 Grade A, % 

DVT+ 

DVT- 

DVT+ 

DVT- 

DVT+ 

DVT- 

Pain 

78 

75 

90 

97 

48 

23 

Tenderness 

76 

89 

84 

74 

43 

35 

Edema 

78 

67 

42 

32 

43 

26 

Homans sign 

56 

61 

33 

21 

11 

11 

Swelling 11 

85 

56 

C 


41 

39 

Erythema 11 

24 

38 






Abbreviations: DVT+, those with DVT; DVT-, those without DVT. 
“The DVT diagnosis was observed by venography. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 
“Ellipses indicate data not reported. 


clinical diagnosis of DVT who were referred to the radiol¬ 
ogy department of a general hospital were studied; the sen¬ 
sitivity of the clinical examination was 60% (95% Cl, 45%- 
75%) and the specificity was 72% (95% Cl, 60%-83%). 
Overall, these symptoms and signs occur in similar fre¬ 
quency in symptomatic patients with and without DVT 
(Table 18-2). 

The results of these studies led to a shift away from the 
clinical examination to a heavy reliance on noninvasive 
objective tests for patients with suspected DVT. In a retro¬ 
spective chart review by Landefeld et al 13 of 354 inpatients 
and outpatients with suspected DVT who underwent 
venography, there were 5 clinical findings independently 
related to the presence of proximal DVT: swelling below the 
knee, swelling above the knee, recent immobility, cancer, 
and fever. These factors were determined by using multiple 
linear regression, were found to be significantly associated 
with the presence of proximal DVT in 236 patients, and 
then were confirmed in the remaining 119 patients. Overall, 
the sensitivity of a positive clinical examination (associated 
with the presence of 1 or more independent predictors) was 
96% (95% Cl, 92%-100%) and the specificity was 20% 
(95% Cl, 15%-25%). The frequency of signs and symptoms 
seemed to predict the presence of proximal DVT when the 
absence of any findings was associated with less than a 5% 
chance of proximal DVT, and the presence of 2 or more 
clinical findings was associated with a 46% chance of proxi¬ 
mal DVT. This was the first study to demonstrate the 
potential role of a clinical prediction guide in patients with 
suspected DVT. The likelihood ratio (LR) estimates for the 
clinical assessment according to the 4 studies described 
above are shown in Table 18-3. 

Recall that an LR expresses the odds that a given finding on 
the medical history or physical examination would occur in a 
patient with the target disorder as opposed to a patient with¬ 
out it. Given an LR of more than 1.0, the probability of dis¬ 
ease (in this case, DVT) increases when the finding is present 
because the finding is more likely among the patients with 
the disease than among those without. When the LR is less 


Table 18-3 Likelihood Ratio for Clinical Assessment in Patients With 
Suspected Deep Vein Thrombosis Compared With Venographic Result 3 

Source 

Positive Clinical 
Assessment Result 
for DVT (95% Cl) 

Negative Clinical 
Assessment Result for DVT 
(95% Cl) 

O'Donnell et al 3 

1.2 (1.0-1.5) 

0.40 (0.17-0.96) 

Haeger 4 

1.4(0.95-2.2) 

0.64 (0.34-1.1) 

Molloy et al 12 

2.1 (1.3-3.5) 

0.55 (0.36-0.80) 

Landefeld et al 13 

1.2 (1.1-1.3) 

0.21 (0.08-0.54) 


Abbreviations: Cl, confidence interval; DVT, deep vein thrombosis. 

“Positive clinical assessment result was defined as 1 or more clinical factors; nega¬ 
tive clinical assessment, absence of clinical factors. 

than 1.0, the probability of disease decreases because the 
finding is less likely to occur among patients with the disease 
than among those without. 21 

Objective Assessment 

Venography is the reference standard for the diagnosis of 
DVT, and it is highly accurate for both proximal and calf 
DVT. 22 However, venography is invasive, expensive, techni¬ 
cally inadequate in about 10% of patients (either because of 
an inability to cannulate a vein or because of lack of adequate 
visualization of the deep veins), and may induce DVT in 
approximately 3% of patients. 23 This led to the evaluation 
and validation of 2 noninvasive tests: impedance plethys¬ 
mography and compression ultrasonography. These tests 
have proved to be sensitive to proximal but not to calf vein 
thrombosis. 

Impedance plethysmography reliably detects occlusive 
thrombi of the proximal veins (popliteal, femoral, or iliac 
veins) but is less reliable at detecting nonocclusive proximal 
DVT and is insensitive to calf DVT. 24 ' 27 Impedance plethys¬ 
mography does not allow direct visualization of the veins but 
suggests that DVT is present when significant outflow obstruc¬ 
tion is present, particularly in the absence of a comorbid con¬ 
dition that might cause a false-positive result (eg, extrinsic 
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venous compression or elevated central venous pressure). 
Although studies before 1990 reported that impedance 
plethysmography detected more than 90% of proximal DVT, 
more recent studies reported sensitivities for proximal DVT of 
about 70%. 28 ' 30 This apparent decrease in sensitivity is probably 
caused by changes in referring patterns to specialty centers 
with a strong interest in DVT. 31 

Compression ultrasonography assesses compressibility of 
the femoral and popliteal veins and is highly sensitive and 
specific for detecting proximal DVT (noncompressibility is 
diagnostic of DVT, whereas compressibility rules out 
DVT). 6 ’ 32-34 Neither impedance plethysmography nor com¬ 
pression ultrasonography reliably detects isolated calf vein 
thrombosis. 35 Although the specificity of compression 
ultrasonography and impedance plethysmography for DVT 
remains high in both symptomatic and asymptomatic 
patients, the sensitivity declines dramatically when imped¬ 
ance plethysmography and compression ultrasonography 
are used to evaluate asymptomatic patients (ie, 22% and 
58%, respectively) vs symptomatic patients (ie, 96% and 
96%, respectively). 36 Several diagnostic algorithms using 
serial compression ultrasonography or impedance plethys¬ 
mography have been evaluated and validated in large clini¬ 
cal trials. 24 ' 27,32 ' 34 ' 37 ' 42 Although compression ultrasonography 
appears to be more accurate than impedance plethysmogra¬ 
phy, serial testing with either is acceptable in patients with 
suspected DVT. 37 - 43 Therefore, as most clinicians consider 
clinically important proximal DVT excluded by normal 
impedance plethysmography or compression ultrasonogra¬ 
phy on the day of presentation, anticoagulants can be safely 
withheld in such patients, because the probability of experi¬ 
encing proximal DVT is less than 2% in the following 3 
months. 44 If the initial test results are normal, repeated 
testing during the next 5 to 7 days is recommended; if they 
become abnormal during this period, extending proximal 
DVT is likely and an anticoagulant therapy should be ini¬ 
tiated. However, impedance plethysmography and com¬ 
pression ultrasonography have limitations too, such as 
availability and the inconvenience and expense of repeated 
testing. 


Recently, the D-dimer assays have been demonstrated to be 
useful adjuncts to noninvasive testing for suspected DVT 
because they are highly sensitive and therefore have high neg¬ 
ative predictive values. 49 D-dimer 45 ' 47 is formed when crossed- 
linked fibrin contained within a thrombus is proteolyzed by 
plasmin. Various D-dimer assays are available, including 
enzyme-linked immunosorbent assays (ELISA), latex agglu¬ 
tination assays, and a whole blood agglutination test. 46 The 
whole blood agglutination assay appears to be best for exclu¬ 
sion of DVT because it is suitable for individual testing 
(unlike ELISA) and has high sensitivity and reasonable speci¬ 
ficity. Recent studies show that DVT can be reliably excluded 
in patients with suspected DVT who have a normal imped¬ 
ance plethysmograph result and a normal D-dimer result 
(using a high-sensitivity whole blood assay) and that such 
results occur in about two-thirds of patients. 45 This supports 
the role of the assay as a simple and rapid adjunct to nonin¬ 
vasive tests for the exclusion of clinically important DVT. 45>46 
For a summary of diagnostic algorithms for patients with 
suspected DVT, see Table 18-4. 

Clinical Prediction Guide 

Recently, the clinical assessment of patients with suspected 
DVT was reevaluated. This was sparked by 2 observations that 
many patients with a high pretest probability (using clinical 
judgment) and a normal impedance plethysmograph had 
proximal DVT, 28 and that the pretest probability of patients 
had an important influence on diagnosing PE, a closely related 
disease. For example, in patients with a low pretest probability 
and a high-probability lung scan, the prevalence of PE was 
approximately 50% to 60%. 48 These results generated the 
hypothesis that when pretest probability and further tests are 
concordant, DVT can be ruled in or out, whereas when they 
are discordant, further tests are necessary. 

Development of a Clinical Prediction Guide 

Recently, a clinical prediction guide that seeks to standardize the 
estimation of the pretest probability among clinicians was 
developed 6 and is described below. This model enables clini- 


Table 18-4 Interpretation of Test Results in Patients With Suspected Initial Deep Vein Thrombosis 

Results 


Tests 

Venography 

Compression Ultrasonography 

Impedance Plethysmography 

Diagnose DVT 

Intraluminal filling defect in at least 
2 projections 

Noncompressibility of the femoral or 
popliteal vein 

Abnormal impedance plethysmog¬ 
raphy and a moderate to high clini¬ 
cal probability of DVT 

Exclude clinically important DVT 

Normal venogram result 

Normal compressibility of proximal venous 
segments combined with a low clinical pre¬ 
test probability, or normal serial compression 
ultrasonographic examination result 

Normal impedance plethysmogra¬ 
phy combined with a normal D- 
dimer or normal serial impedance 
plethysmography 

Nondiagnostic for DVT 

Technically inadequate study in 
which all deep veins are not ade¬ 
quately visualized 

Noncompressibility of deep veins of the calf 

Abnormal impedance plethysmog¬ 
raphy combined with a low clinical 
suspicion 


Abbreviation: DVT, deep vein thrombosis. 
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dans to reliably stratify patients with suspected DVT into high-, 
moderate-, or low-probability groups by following uniform cri¬ 
teria. After a review of the literature 3,4 - 8,18 and input from experi¬ 
enced thrombosis investigators, categories deemed to be 
important in the estimation of a patients pretest probability were 
considered and categorized as follows: signs and symptoms of 
DVT, risk factors for DVT, and the presence or absence of diag¬ 
noses that were deemed at least as likely as DVT to explain the 
patient’s symptoms. These include musculoskeletal injuries, cel¬ 
lulitis, and prominent lymphadenopathy of the inguinal area. 
The clinical prediction guide uses a scoring system that combines 
important symptoms and signs, risk factors for DVT, and the 
presence or absence of an alternative diagnosis. The results strat¬ 
ify patients with suspected DVT into low-, moderate-, or high- 
probability groups. The original clinical prediction guide was 
initially developed in a training set of 100 outpatients at a throm¬ 
bosis referral center at McMaster University, Hamilton, Ontario, 
Canada, who presented with suspected DVT. All patients under¬ 
went venography, and a simple regression model determined the 
relative importance of individual and various clusters of factors 
to predict the probability that a patient had DVT. 

The clinical prediction guide was then prospectively validated 
in a test set of 529 patients who presented with suspected DVT 
to 3 tertiary-care referral centers: 2 in Hamilton and 1 in Padua, 
Italy. 6 Clinicians recorded their assessment of pretest probabil¬ 
ity of DVT, and then all patients underwent venography and 
compression ultrasonographic examination. This model cannot 
be applied to certain subgroups of patients who were excluded 
from the study, such as those with previous venous thromboem¬ 
bolism, those with concomitandy suspected PE, pregnant 
women, or patients receiving treatment with anticoagulants. 
With the clinical model, eligible patients were initially stratified 
into low-, moderate-, or high-pretest-probability groups. 

Although individual physical findings on their own are not 
predictive of DVT, when specific physical signs are incorporated 
into the clinical prediction guide they contribute to the genera¬ 
tion of the pretest probability of DVT. In Table 18-5, the physical 
signs and the scoring system of the clinical prediction guide are 
outlined. The physical signs classified as major points include 
localized tenderness to palpation along the distribution of the 
deep venous system; thigh and calf swelling, indicating that the 
entire leg has an increased diameter compared with the asymp¬ 
tomatic side; and calf swelling, in which the calf is measured 
approximately 10 cm below the tibial plateau (at the tibial tuber¬ 
osity) and swelling is considered present if the difference between 
calf diameters is more than 3 cm. Minor points include the pres¬ 
ence of a unilateral pitting edema of the leg with standard assess¬ 
ment measures, the presence of dilated superficial veins 
(nonvaricose) that persist with elevation in the lower limb or if 
present in any new pattern in the groin region on the sympto¬ 
matic leg only, and the presence of diffuse or streaking erythema. 

The test set confirmed that the clinical model could reli¬ 
ably classify patients into high-, moderate-, and low-proba¬ 
bility groups. The prevalence of all DVT (proximal and calf), 
using the venogram as the criterion standard in patients who 
were classified by the clinical model into the high-probability 
strata, was 85% compared with 33% in the moderate-proba¬ 
bility and 5% in the low-probability categories. The positive 


LRs for the high-, moderate-, and low-risk categories are 16 
(95% Cl, 9.3-28), 1.3 (95% Cl, 1.0-1.7), and 0.2 (95% Cl, 
0.1-0.3), respectively. The specificity of compression ultra¬ 
sonography to detect proximal DVT in all strata was between 
98% and 100%. When interpreted in conjunction with pre¬ 
test probability, the ability of compression ultrasonography 
to reliably diagnose DVT decreased as the pretest probability 
declined. The sensitivities of compression ultrasonography 
in the high, moderate, and low strata were 94%, 83%, and 
80%, respectively. The corresponding LRs for compression 
ultrasonography in pretest probability strata are provided in 
Table 18-6. By combining pretest probability and compres¬ 
sion ultrasonography results, the posttest probabilities of 
DVT for each possible combination of results were gener¬ 
ated. In the high-pretest-probability strata, an abnormal 
compression ultrasonogram result led to a 100% posttest 
probability; in the moderate strata, a 96% posttest probabil- 


Table 18-5 Estimation of Pretest Probability of Deep Vein Thrombosis 
Using the Clinical Model 3 

Major Points 

Active cancer (treatment ongoing or within previous 6 mo or palliative) 
Paralysis, bedridden > 3 days, or major surgery within 4 wk 
Localized tenderness along the distribution of the deep venous system in 
calf or thigh 

Thigh and calf swollen (should be measured) 

Calf swelling by > 3 cm when compared with the asymptomatic leg (mea¬ 
sured 10 cm below the tibial tuberosity) 

Strong family history of DVT (>2 first-degree relatives with history of DVT) 

Minor Points 

History of recent trauma (<60 d to the symptomatic leg) 

Pitting edema in symptomatic leg only 

Dilated superficial veins (nonvaricose) in symptomatic leg only 

Hospitalization within previous 6 mo 

Erythema 

Abbreviation: DVT, deep vein thrombosis. 

Items excluded from the model are age, duration of symptoms, sex, obesity, presence of 
varicose veins, a palpable cord, and Homans sign. Scoring method: high probability if > 3 
major points and no alternative diagnosis, > 2 major points and > 2 minor points and no 
alternative diagnosis; low probability if 1 major point and < 2 minor points and an alterna¬ 
tive diagnosis, 1 major point and < 1 minor point and no alternative diagnosis, 0 major 
points and < 3 minor points and an alternative diagnosis, 0 major points and < 2 minor 
points and no alternative diagnosis; and moderate probability for all other combinations. 


Table 18-6 Likelihood Ratios for Ultrasonographic Results by Clinical 
Probability Strata 

Pretest Probability 

Ultrasonography 

LR+ (95% Cl) 

High 

Abnormal 

“ (3-°°) 

Moderate 

Abnormal 

72 (13-412) 

Low 

Abnormal 

34 (14-76) 

High 

Normal 

0.06(0.03-0.16) 

Moderate 

Normal 

0.17(0.07-0.34) 

Low 

Normal 

0.20 (0.06-0.52) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 
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ity; and in the low strata, a 63% posttest probability. In 
patients whose compression ultrasonogram result was nor¬ 
mal, the posttest probabilities of DVT in the high, moderate, 
and low strata were 24%, 5%, and less than 1%, respectively. 

The original clinical prediction guide was recendy simplified 
with stepwise logistic regression and reevaluated. 49 Recent 
trauma, family history, erythema, and hospitalization within the 
previous 6 months did not remain in the simplified model, 
which, in combination with compression ultrasonography, was 
prospectively tested in 593 patients with suspected DVT who 
were referred to tertiary-care thrombosis clinics 49 (Table 18-7). 
Similar to the original clinical prediction guide, the simplified 
guide was able to reliably stratify patients into high-, moderate-, 
or low-probability groups, with corresponding prevalences of 
DVT of 75% (95% Cl, 63%-81%), 17% (95% Cl, 12%-23%), 
and 3% (95% Cl, 1.7%-5.9%), respectively. 

These data support the use of a clinical prediction guide to 
simplify the diagnostic approach for patients with suspected 


Table 18-7 Simplified Clinical Model 3 

Clinical Characteristic 

Score 

Active cancer (treatment ongoing or within previous 6 mo or palliative) 

1 

Paralysis, paresis, or recent plaster immobilization of the lower 
extremities 

1 

Recently bedridden for > 3 d of major surgery within 4 wk 

1 

Localized tenderness along the distribution of the deep venous system 

1 

Entire leg swelling 

1 

Calf swelling by > 3 cm compared with the asymptomatic leg 
(measured 10 cm below the tibial tuberosity) 6 

1 

Pitting edema (greater in the symptomatic leg) 

1 

Collateral superficial veins (nonvaricose) 

1 

Alternative diagnosis as likely as or greater than that of DVT 

-2 


Abbreviation: DVT, deep vein thrombosis. 

“Scoring method: high probability if score is 3 or higher, moderate if score is 1 or 2, 
and low if score is 0 or lower. 

b ln patients with symptoms in both legs, the more symptomatic leg was used. 


DVT (Figure 18-1). In patients with a high or moderate pretest 
score who have an abnormal compression ultrasonogram result, 
DVT can be reliably diagnosed (LR+, °° and 72, respectively) 
and treatment should be initiated. In patients with a low pretest 
probability of DVT who have a normal compression ultrasono¬ 
gram result (LR-, 0.2), DVT can be reliably ruled out without 
further testing. For patients with discordant results (ie, high pre¬ 
test probability and normal compression ultrasonogram result, 
or low pretest probability and an abnormal compression ultra¬ 
sonogram result), further testing is recommended (ie, venogra¬ 
phy or serial compression ultrasonography). Patients with a 
moderate pretest probability and a normal ultrasonogram result 
have a 5% probability of having DVT, and a repeated compres¬ 
sion ultrasonographic examination in 7 days is recommended. 


CLINICAL SCENARIO—RESOLUTION 


The patient described in the “Clinical Scenario” section is a 
55-year-old woman who presents with suspected DVT. Using 
the clinical prediction guide checklist found in Table 18-5, 
you determine that she has 5 clinical features predictive of 
DVT: a diagnosis of active cancer, calf swelling, erythema, 
localized tenderness along the popliteal vein, and pitting 
edema of the symptomatic leg. Although the possibility of 
enlarging pelvic lymph nodes in the right inguinal area offers 
an alternative diagnosis, you observe that a recent pelvic 
ultrasonographic report indicates that these nodes have 
shrunk, rendering this a less likely alternative diagnosis. 
Therefore, with 5 clinical features of DVT and no convincing 
alternative diagnosis, following the approach of the clinical 
prediction guide you conclude that she has a high clinical 
probability of experiencing acute DVT. The next step is to 
perform a compression ultrasonographic examination, and, 
if the results are abnormal, the posttest probability of DVT 
being present approaches 100%. However, if the ultrasono¬ 
gram result is normal (ie, showing normal compressibility of 
the proximal veins), the posttest probability is approximately 
24%, and further testing with venography would be required. 


Apply clinical model 


Low PTP Moderate PTP 


Ultrasonography 

normal 

l 

Rules out DVT 


Ultrasonography 

abnormal 

1 

Venography 


Normal Abnormal 

1 1 

Rules out DVT 

DVT anticoagulant 
therapy 


Ultrasonography 

normal 


Ultrasonography 

abnormal 


1 I 

Repeat DVT anticoagulant 

ultrasonography therapy 

in f wk 


Normal Abnormal 

1 1 

Rules out DVT 

DVT anticoagulant 
therapy 


Figure 18-1 Suggested Diagnostic Approach in Patients With Suspected DVT 

Abbreviations: DVT, deep vein thrombosis; PTP, pretest probability. 
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CONCLUSIONS 

Although physical findings of patients with suspected DVT 
are not useful on their own, a clinical prediction guide that 
includes factors from both the medical history and physical 
examination is able to assist in the diagnosis of DVT. When 
used in combination with noninvasive tests, such as com¬ 
pression ultrasonography, it can simplify and reduce the 
expense of management strategies. 

THE BOTTOM LINE 

Individual symptoms and signs on their own are not useful 
to diagnose DVT. However, a systematic review of patients’ 
risk factors, symptoms, and physical signs allows the clini¬ 
cian to reliably determine the pretest probability that a 
patient has DVT. This strategy, in combination with the 
results of noninvasive diagnostic test results, guides further 
diagnostic testing and treatment strategies. 
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A 60-year-old man referred with suspected deep vein 
thrombosis (DVT) cut the plantar surface of his left foot 
on glass 10 days ago and has been resting in bed. He pre¬ 
sents with left leg pain and mild calf swelling, redness, and 
heat. There is no history of a DVT or known family his¬ 
tory of venous thromboembolism. Physical examination 
shows the patient is febrile and has pitting edema of the 
left calf. The calf erythema is hot, tender, and well demar¬ 
cated. Enlarged left inguinal lymph nodes are present. He 
has longstanding diabetes mellitus, and the diagnoses that 
seem most likely are cellulitis and DVT. Can a clinical 
probability estimate of DVT reliably determine a pretest 
probability that can be used in decision making? 


CLINICAL EVALUATION AND 
CLINICAL PREDICTION RULES 

DVT occurs frequently, with an estimated annual incidence 
of 0.1% in white populations, 1 - 2 creating considerable mor¬ 
bidity. Complications include postphlebitic syndrome and 
chronic thromboembolic pulmonary hypertension, whereas 
pulmonary embolism (PE) causes death in 1% to 8% of 
affected patients despite treatment. 3 ' 5 Although anticoagulant 
therapy decreases the risk of recurrent thrombosis, the treat¬ 
ment also increases the risk of major hemorrhage and other 
potentially serious consequences, such as heparin-induced 
thrombocytopenia. Therefore, diagnostic strategies must 
correctly diagnose DVT when present and safely rule out 
DVT when absent. The desire to not miss a patient with 
DVT, combined with the large number of nonspecific signs 
and symptoms, makes DVT part of the differential diagnosis 
in most patients presenting with leg pain or swelling. Unfor¬ 
tunately, the nonspecific signs and symptoms force clinicians 
to investigate many patients who do not have DVT. In the 
past, clinical assessment was not quantified in the diagnostic 
assessment in patients with suspected DVT, and before 1995, 
the approach was for all patients with suspected DVT to 
undergo ultrasonography. 6 - 7 This approach was inefficient 
because most patients with suspected DVT did not have the 
disorder (DVT rates ranging from 10% to 25%). 7 ' 9 Because 
imaging for calf DVT is relatively inaccurate and often inade¬ 
quate, 10 - 11 serial testing in which only the proximal veins were 
evaluated and testing repeated 1 week later in the case of neg¬ 
ative results was the standard. Several studies performed in 
the last decade successfully incorporated clinical assessment 
into the diagnostic approach. 

In a previous Rational Clinical Examination article, we out¬ 
lined how categorizing patients as having a low clinical proba¬ 
bility for DVT eliminates the need for serial testing, whereas 
categorizing patients as having a high clinical probability 
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selects those in whom a negative ultrasonographic result may 
be a false negative. 12 We also emphasized that false-positive 
ultrasonographic results were most likely when patients had a 
low clinical probability for DVT. The clinical prediction rule 8 
described in that article had not been widely evaluated. We 
conducted a new systematic review to determine the accuracy 
of the same clinical prediction rule for DVT. 

The incorporation of D-dimer testing into diagnostic algo¬ 
rithms has simplified the treatment of a patient presenting with 
suspected DVT. 1316 Clinical trials demonstrate safe, feasible, and 
validated approaches for the treatment of patients with sus¬ 
pected DVT. However, it is also clear that D-dimer assays differ 
with respect to sensitivity and specificity. Recent meta-analyses 
summarize the accuracy of various D-dimer assays compared 
with gold standard imaging tests for DVT. 17,18 

Diagnostic algorithms work by combining the pretest prob¬ 
ability estimate (or clinical suspicion) with the likelihood ratio 
(LR) of a diagnostic test result, providing an accurate probabil¬ 
ity of disease after testing. 19 Given the consequences of failing 
to detect DVT, a strategy that produces probabilities of 1% or 
less after testing should provide reassurance that additional 
tests are unnecessary. The combination of a low or unlikely 
clinical probability estimate with a negative D-dimer result 
safely rules out DVT. 13 The following are not clear: whether the 
clinical prediction rule (eg, Wells et al 13 ) can be used reliably 
across a broad range of at-risk population; what an estimated 
pooled risk of DVT is in each pretest category; and how pretest 
clinical probability estimates should be used with different D- 
dimer assays. To date, 3 studies have evaluated the literature on 
clinical prediction rules for the diagnosis of DVT, but all have 
limitations. 20 ' 22 Specifically, they included studies and data that 
either did not use the model or used the model incorrectly by 
including patients with previous DVT (the most recent 
changes to the model include a point for patients with previ¬ 
ous DVT). Indeed, Goodacre et al 22 report that exclusion of 
persons with a history of thromboembolism is associated with 
improved diagnostic performance of the model by Wells et al 13 ; 
however, they did not report summary prevalence data, one 
article reported only events rates in follow-up, and none 
reported LR data in combination with D-dimer testing. We 
conducted a systematic review to determine the accuracy of 
clinical prediction rules for DVT and D-dimer assays in con¬ 
junction with the clinical probability estimate. 

METHODS 

Study Identification 

We searched for English- and French-language clinical studies 
that used a clinical prediction model or clinical assessment in 
the DVT diagnostic process. To evaluate the role of D-dimer, we 
also sought studies that used D-dimer in combination with clin¬ 
ical assessment. Published studies were identified by searching 
MEDLINE from January 1,1990, to July 1,2004, using the Med¬ 
ical Subject Headings “venous thrombosis” or “thrombophlebi¬ 
tis,” “fibrin or fibrinogen degradation products,” and “predictive 
value of tests,” and key words “DVT,” “D-dimer,” “diagnosis,” 
“sensitivity,” “specificity,” “clinical probability,” “clinical model,” 


or “decision rule.” We supplemented the MEDLINE search by 
scrutinizing the reference lists of all articles selected for inclu¬ 
sion, review articles retrieved, and review of our own reference 
library of more than 4200 articles. 

Study Selection 

To be included in the review, all of the following criteria were 
required: (a) enrollment of consecutive outpatients with 
symptoms and signs of suspected DVT; (b) prospective trial 
design involving a minimum 3-month follow-up; (c) objective 
documentation of all venous thromboembolic events (DVT 
and PE); (d) exclusion of patients with previous DVT unless 
the clinical model adjusted for the history of DVT or the 
reviewers could make that adjustment; (e) assessment of 
patients with a validated clinical rule to estimate the clinical 
probability of DVT before D-dimer testing or diagnostic imag¬ 
ing; (f) performance of D-dimer testing before other diagnos¬ 
tic tests (although D-dimer testing was not a requirement for 
study inclusion); (g) available data on the prevalence of DVT 
in at least 1 of the 3 risk estimate categories (low, moderate, or 
high); (h) evaluation of proximal DVT; and (i) study quality 
graded A or B with the scheme previously appearing in The 
Rational Clinical Examination series, adapted from Holleman 
and Simel 23 (see Table 1-7) as shown: 

Level 1: Independent, blinded comparison of symptom or 
sign results with a criterion standard of diagnosis among a large 
number of consecutive patients (>300) with suspected DVT. 

Level 2: Independent, blinded comparison of symptom or 
sign results with a criterion standard of diagnosis among 
consecutive patients (<300) with suspected DVT. 

Data Extraction 

Two authors independently reviewed and abstracted data for 
determining prevalence of DVT in low-, moderate-, and 
high-clinical-probability groups; sensitivity and specificity; 
and LRs of D-dimer testing in each of the 3 clinical probabil¬ 
ity groups. 

Statistical Analysis 

Data were imported into the Comprehensive Meta-Analysis 
software program version 2.197 (Biostat Inc, Englewood, 
New Jersey) and analyzed with a random-effects model. 

For each study, the overall prevalence of DVT and the preva¬ 
lence among patients with low, moderate, or high clinical 
probability estimate were calculated. We confirmed the sensi¬ 
tivity and specificity and 95% confidence intervals (CIs) for 
each study that included D-dimer testing. The positive and 
negative likelihood ratios (LR+ and LR-) for each clinical 
probability estimate according to the D-dimer subset were cal¬ 
culated. An LR+ is a measure of how strongly a positive result 
increases the odds of disease and an LR- is measure of how 
well a negative result decreases the odds of disease. The easiest 
way to interpret LRs is to keep in mind that the likelihood of a 
disease outcome increases when the LR is greater than 1, the 
likelihood of disease decreases if the LR is less than 1, and an 
LR close to 1 does not change the likelihood. We also calcu- 


CHAPTER 18 Deep Vein Thrombosis 


lated the pooled LR because, unlike diagnostic odd ratios 
(ORs), the LRs can be used for clinical decisions. 

Studies were grouped into 2 subsets, depending on the 
accuracy of the D-dimer that was used (ie, high sensitivity 
and moderate sensitivity, according to Stein et al 18 ), and the 
same calculations were performed. Diagnostic ORs were cal¬ 
culated with correction for 100% sensitivities by adding 0.5 
to each cell of the 2x2 table. 24,25 The diagnostic OR is a single 
indicator of diagnostic test performance, reflecting its accu¬ 
racy. With the random-effects model, the pooled estimates 
for the overall diagnostic OR as well as for the 2 subsets of D- 
dimer assays were calculated. For the 2 subsets of D-dimer 
assays, we evaluated differences between the sensitivity and 
specificity of the assays, between the low- and moderate- 
clinical-probability groups, and between the moderate- and 
high-pretest-probability groups with a % 2 test. 

RESULTS 

After reviewing all titles and abstracts, we identified 67 of 274 
articles for further review. Of the 67 articles, 14 met the 


inclusion criteria involving 8239 patients (Table 18-8). 9,13 ' 15, 26-35 
The only studies eligible used the Wells clinical prediction 
rule (Table 18-9). One study reported D-dimer data on an 
earlier study, so it was not included in the calculation of 
prevalence. 27 Twelve of the 14 studies evaluating 5690 
patients incorporated D-dimer testing into the diagnostic 
algorithm. 9,13 ' 15,27 ' 34 

Does the Clinical Prediction Rule Accurately 
Categorize the Pretest Probability Estimate? 

To be useful, the clinical probability estimate for DVT must 
be reproducible. Put another way, when the same patient or 
different patient populations presenting with suspected DVT 
are assessed, the clinical prediction rule should yield similar 
estimates for the risk of DVT. All studies included in this sys¬ 
tematic review used the same clinical prediction rule. The 
pooled prevalence of DVT in the studies included in this 
meta-analysis was 19% (95% Cl, 16%-23%). The pooled 
prevalence of DVT in the low-, moderate-, and high-clinical- 
probability groups was 5.0% (95% Cl, 4.0%-8.0%), 17% 
(95% Cl, 13%-23%), and 53% (95% Cl, 44%-61%), respec- 


Table 18-8 Summary of Studies of Deep Vein Thrombosis Diagnosis Involving Clinical Prediction Rule With or Without D-dimer Testing in Outpatients 

Source, y 

Evidence 

Quality 

Level 

Outpatient 

Population 

Had 

Ultrasonography, 

% 

Requiring 

Serial 

Ultrasonography, 

% 

D-dimer 

Assay 

Score 

Previous 

DVT 

Excluded 

Prevalence 
of DVT, % 

Anderson et al, 26 1999 

1 

447 

100 

27 

N/A 

Wells 

Yes 

13 

Anderson et al, 27 2000 

2 

214 

100 

N/A 

Moderate 

sensitivity 

Wells 

Yes 

13 

Miron et al, 28 2000 

2 

270 

N/A 

N/A 

High sensitivity 

Wells empirical 
estimate 8 

Yes 

21 

Kearon et al, 29 2001 

1 

445 

60 

N/A 

Moderate 

sensitivity 

Wells 

Yes 

14 

Aguilar et al, 30 2002 

2 

134 

100 

0 

High sensitivity 

Wells 

Not stated 

N/A 

Bucek et al, 31 2002 

2 

99 Patients with 
low clinical 
probability 

74 

0 

High sensitivity 

Wells 

No b 

N/A 

Kraaijenhagen et al, 15 2002 

1 

1756 

100 

47 

Moderate 

sensitivity 

Wells 

Yes 

24 

Shields et al, 32 2002 

2 

102 

100 

0 

Moderate 

sensitivity 

Wells 

Yes 

17 

Tick et al, 33 2002 

1 

811 

100 

10 

Moderate 

sensitivity 

Wells 

Yes 

42 

Anderson et al, 34 2003 

1 

1075 

71 

19 

Moderate 

sensitivity 

Modified Wells 0 

No 

18 

Bates et al, 9 2003 

1 

556 

49 

7 

High sensitivity 

Wells 

Yes 

10 

Schutgens et al, 14 2003 

1 

812 

78 

38 

High sensitivity 

Wells 

Yes 

39 

Wells et al, 13 2003 

1 

1082 

62 

18 

Moderate 

sensitivity 

Modified Wells 0 

No 

16 

Stevens et al, 35 2004 

1 

436 

100 

0 

Not done 

Wells 

Yes 

14 


Abbreviations: DVT, deep vein thrombosis; N/A, overall prevalence not available. 

“Did not report D-dimer data; clinical prediction tool data from this prospective study was analyzed retrospectively. 
b 0nly results for patients without previous DVT used in analysis (n = 87). 

'Modified Wells score including 1 point for a history of DVT. 
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Table 18-9 Simplified Clinical Model for Assessment of Deep Vein 
Thrombosis 2 

Clinical Variable 

Score 

Active cancer (treatment ongoing or within previous 6 mo 
or palliative) 

1 

Paralysis, paresis, or recent plaster immobilization of the 
lower extremities 

1 

Recently bedridden for 3 d or more, or major surgery 
within the previous 12 wk requiring general or regional 
anesthesia 

1 

Localized tenderness along the distribution of the deep 
venous system 

1 

Entire leg swelling 

1 

Calf swelling at least 3 cm larger than that on the asymp¬ 
tomatic leg (measured 10 cm below the tibial tuberosity) 11 

1 

Pitting edema confined to the symptomatic leg 

1 

Collateral superficial veins (nonvaricose) 

1 

Previously documented DVT 

1 

Alternative diagnosis at least as likely as DVT 

-2 


Abbreviation: DVT, deep vein thrombosis. 

“Scoring method indicates high probability if score is 3 or higher, moderate if score is 
1 or 2, and low if score is 0 or lower. 

6 ln patients with symptoms in both legs, the more symptomatic leg was used. 


tively (Figure 18-2). Interobserver reliability has not been 
widely evaluated, but the reported studies included many 
physicians with a wide range of clinical experience, including 
junior residents. 

With the Wells et al 13 criteria applied, the patient would 
have a score of 0, summed by pitting edema (1 point), bed 
rest (1 point), and an alternative diagnosis (cellulitis) at least 
as likely as DVT (-2 points). Using the clinical prediction 
rule, the clinician concludes that the patient has a low clinical 
probability of having an acute DVT. These data suggest that 
the clinician should be confident that the prevalence of DVT 
is approximately 5%. Would additional tests decrease the 
likelihood of DVT below 5%? 

D-dimer Testing 

D-dimer is a degradation product of a cross-linked fibrin 
blood clot. Levels of D-dimer are typically elevated in 
patients with acute venous thromboembolism. D-dimer 
levels may also be increased by a variety of nonthrombotic 
disorders, including recent major surgery, hemorrhage, 
trauma, pregnancy, cancer, or acute arterial thrombosis. 36 
D-dimer assays are, in general, sensitive but nonspecific 
markers so that a positive D-dimer result is not useful to 
“rule in” the diagnosis of DVT. Instead, the value of the 


Clinical 

Probability 

High 


Source, y 


Prevalence, % 

(95% Confidence Interval) 


Miron et al, 28 2000 

74 (59-85) 

Kearon et al, 29 2001 

69 (55-80) 

Kraaijenhagen et al, 15 2002 

66 (60-71) 

Schutgens et al, 14 2003 

59 (54-65) 

Shields et al, 32 2002 

59 (35-79) 

Anderson et al, 26 1999 

49 (35-63) 

Anderson et al, 34 2003 

47 (40-54) 

Stevens et al, 35 2004 

40 (28-52) 

Wells et al, 13 2003 

39 (33-45) 

Bates et al, 9 2003 

30(20-41) 

Overall 

53 (44-61) 

Schutgens et al, 14 2003 

38 (33-43) 

Kraaijenhagen et al, 15 2002 

26 (23-30) 

Aguilar et al, 30 2002 

19(14-27) 

Miron et al, 28 2000 

19(13-28) 

Anderson et al 34 2003 

18(15-22) 

Anderson et al, 26 1999 

14(9-22) 

Shields et al 32 2002 

14(6-27) 

Wells et al, 13 2003 

13(11-17) 

Stevens et al, 35 2004 

13(9-19) 

Kearon et al 29 2001 

13(9-18) 

Bates et al 9 2003 

9(6-14) 

Overall 

17(13-23) 

Schutgens et al, 14 2003 

13(9-18) 

Tick et al, 33 2002 

13(9-17) 

Kraaijenhagen et al, 15 2002 

8(7-10) 

Bates et al, 9 2003 

6 (4-9) 

Stevens et al 35 2004 

5 (2-9) 

Anderson et al 34 2003 

4(3-7) 

Wells et al, 13 2003 

4(2-6) 

Miron et al, 28 2000 

3(1-8) 

Anderson et al 26 1999 

3(1-7) 

Shields et al, 32 2002 

2(0-15) 

Kearon et al 29 2001 

2(1-6) 

Bucek et al, 31 2002 

2(1-9) 

Overall 

5 (4-8) 

:e of Deep Vein Thrombosis 

19(16-23) 


0 20 40 60 80 100 

Prevalence, % (95% Confidence Interval) 


Figure 18-2 Prevalence of Deep Vein Thrombosis 


















CHAPTER 18 Deep Vein Thrombosis 


D-dimer is with a negative test result that works to decrease 
the likelihood of the diagnosis. 

The ability of a negative D-dimer result to “rule out” DVT 
depends on the type of assay. D-dimer assays are categorized 
as high sensitivity vs moderate sensitivity. The efficiency of a 
negative result to rule out DVT increases proportionately 
with the sensitivity of the assay, but it is inversely related to 
the prevalence of venous thromboembolism. On the other 
hand, the specificity of the particular D-dimer assay and the 
population under study affect its ability to rule out the diag¬ 
nosis of DVT. For instance, use of a less specific assay or the 
testing of hospitalized patients who are currently ill limits its 
value because of the expected number of false-positive 
results. 

The incorporation of D-dimer testing into diagnostic algo¬ 
rithms simplifies the management of a patient presenting 
with suspected DVT. Since the last review, numerous trials 
evaluated the accuracy of D-dimer and its incorporation into 
the diagnostic approach in patients with suspected DVT. 
Recent meta-analyses summarize the accuracy of various D- 
dimer assays compared with gold standard imaging tests for 
DVT. 

Returning to the clinical scenario outlined earlier, a D- 
dimer test is performed. The hospital uses a moderately sen¬ 
sitive D-dimer assay. Does the type of D-dimer assay matter? 
Does the D-dimer result affect the already low probability of 
DVT? 

How Will D-dimer Testing Simplify DVT Diagnosis? 

Although a variety of quantitative and qualitative D-dimer 
assays are available and with all involving specific mono¬ 
clonal antibodies, 2 methods have been extensively investi¬ 
gated: enzyme-linked immunosorbent assays (ELISA) and 
whole blood assays. There is wide variation in the sensitivity, 
normal reference ranges, and cutoff points among different 
assays. Current available assays can be divided into highly sen¬ 
sitive or moderately sensitive tests. 18 A recent meta-analysis of 
different D-dimer assays shows that the ELISAs and certain 
immunoturbidimetric tests are highly sensitive (>95%) but 
less specific (approximately 40% at a cutoff value of 500 ng/ 
mL) for excluding DVT. 18 In general, other D-dimer methods 
such as whole blood and quantitative latex agglutination 


assays are moderately sensitive (=85%) but more specific 
(>65%). Therefore, the probability after testing varies 
according to the D-dimer assay used. Before clinicians use a 
particular D-dimer assay to revise their clinical probability 
estimate, they should be aware of the differences and inter¬ 
pret the results accordingly. The use of D-dimer testing has 
improved the diagnostic process in suspected DVT, but the 
D-dimer result itself does not serve as the reference standard 
for the presence or absence of DVT. 

The pooled sensitivity, specificity, and negative LRs of the 
D-dimer test in the low-clinical-probability group were 88% 
(95% Cl, 81%-92%), 72% (95% Cl, 65%-78%), and 0.18 
(95% Cl, 0.12-0.28), respectively. Among patients with mod¬ 
erate clinical probability estimate, the pooled values were 
90% (95% Cl, 80%-95%), 58% (95% Cl, 49%-67%), and 
0.19 (95% Cl, 0.11-0.32), respectively; among patients with 
high clinical probability estimate, the results were 92% (95% 
Cl, 85%-96%), 45% (95% Cl, 37%-52%), and 0.16 (95% Cl, 
0.09-0.30), respectively. The specificity of D-dimer testing 
decreased as the clinical suspicion for DVT increased from 
low to moderate and from moderate to high (P < .001) with 
no change in the sensitivity (P = .51 and .28, respectively). 
The lower specificity of D-dimer testing among patients with 
a high clinical suspicion for DVT might be due to more 
comorbid conditions (eg, surgery or cancer) that can cause 
high D-dimer levels. 37 Among patients in this group, the 
number of false-positive D-dimer results can exceed the 
number of negative results, thereby limiting its use. The 
pooled estimates for diagnostic OR for D-dimer tests in the 
low-, moderate-, and high-clinical-probability groups were 
17 (95% Cl, 9.9-28), 14 (95% Cl, 8.6-21), and 12 (95% Cl, 
5.7-25), respectively; that is, the diagnostic OR did not differ 
between clinical probability estimates despite a variation in 
sensitivity and specificity. These data are summarized in 
Table 18-10. Because the literature suggests that D-dimer 
assays can be broadly considered as high-sensitivity or mod¬ 
erate-sensitivity assays, we analyzed the eligible D-dimer 
studies in these categories. 

Moderate-Sensitivity D-dimer Assays 

The sensitivity, specificity, negative predictive values, LRs+, 
LRs-, and their respective 95% CIs for the studies that used 


Table 18-10 Accuracy Measures for D-dir 

Measures 

ner Pooling of All Studies 

Clinical Pretest Probability (95% Cl) 

Low 

Moderate 

High 

Sensitivity, % 

88(81-92) 

90 (80-95) 

92 (85-96) 

Specificity, % 

72 (65-78) 

58 (49-67) 

45 (37-52) 

Negative predictive value 

99 (98-99) 

96 (94-97) 

84 (77-89) 

Positive predictive value 

17(13-20) 

32 (25-41) 

66 (56-75) 

Positive likelihood ratio 

3.3 (2.6-4.1) 

2.1 (1.8-2.5) 

1.6 (1.5-1.8) 

Negative likelihood ratio 

0.18(0.12-0.28) 

0.19(0.11-0.32) 

0.16(0.09-0.30) 

Diagnostic OR 

17(9.9-28) 

14(8.6-21) 

12(5.7-25) 


Abbreviations: Cl, confidence interval; OR, odds ratio. 
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Table 18-11 Accuracy Measures in the Moderate-Sensitivity D-dimer Studies 


Clinical 
Probability 
Before Testing 

Study 

Sensitivity, % 

Specificity, % 

NPV, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Low 

Wells et al, 13 2003 

93 

73 

100 

3.7 (2.9-4.6) 

0.10(0.01-1.4) 


Kraaijenhagen et al, 15 2002 

87 

67 

98 

2.6 (2.3-3.1) 

0.20(0.11-0.36) 


Kearon et al, 29 2001 

80 

88 

99 

6.4(3.6-11) 

0.23(0.04-1.3) 


Anderson et al, 27 2000 

90 

85 

99 

6.7(4.3-10) 

0.12(0.01-1.7) 


Anderson et al, 34 2003 

85 

73 

99 

3.2 (2.5-4.1) 

0.20 (0.07-0.58) 


Shields et al, 32 2002 

NE 

80 

98 

5.0 (2.7-9.3) 

0.32 (0.03-3.50) 


Weighted average (95% Cl) 

86 (79-92) 

78(71-83) 

99 (98-99) 

4.0 (3.0-5.4) 

0.20(0.12-0.31) 

Moderate 

Wells et al, 13 2003 

94 

60 

98 

2.4 (2.0-2.8) 

0.10(0.03-0.38) 


Anderson et al, 34 2003 

80 

72 

94 

2.9 (2.4-3.6) 

0.27(0.17-0.43) 


Kraaijenhagen et al, 15 2002 

94 

57 

96 

2.2 (2.0-2.5) 

0.11 (0.05-0.21) 


Shields et al, 32 2002 

93 

53 

98 

2.1 (1.5-3.0) 

0.14(0.01-2.0) 


Kearon et al, 29 2001 

71 

69 

94 

2.3 (1.6-3.2) 

0.42 (0.23-0.80) 


Anderson et al, 27 2000 

67 

84 

94 

4.2 (2.0-9.0) 

0.40(0.16-1.0) 


Weighted average (95% Cl) 

85 (73-93) 

66 (58-73) 

95 (93-97) 

2.4 (2.1-2.7) 

0.23 (0.13-0.39) 

High 

Wells et al, 13 2003 

83 

44 

79 

1.5 (1.2-1.9) 

0.39 (0.20-0.77) 


Anderson et al, 34 2003 

84 

48 

77 

1.6 (1.3-2.0) 

0.34 (0.20-0.56) 


Shields et al, 32 2002 

80 

71 

71 

2.8 (0.8-9.4) 

0.28(0.07-1.1) 


Kraaijenhagen et al, 15 2002 

98 

44 

91 

1.7 (1.5-2.1) 

0.05(0.02-0.14) 


Kearon et al, 29 2001 

94 

43 

75 

1.7 (1.0-2.6) 

0.13(0.03-0.59) 


Anderson et al, 27 2000 

87 

87 

87 

6.5 (1.8-24) 

0.15(0.04-0.57) 


Weighted average (95% Cl) 

90 (80-95) 

49 (40-58) 

81 (74-86) 

17(1.5-1.9) 

0.20 (0.10-0.38) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NE, not estimable; NPV, negative predictive value. 


moderate-sensitivity D-dimer assays are demonstrated in 
Table 18-1 . Data are presented for each clinical probability 
estimate category. The LRs- are not sufficiently low to rule 
out DVT without ultrasonography among patients with 
moderate and high pretest probability estimates. Among 
these patients, the probability after testing for DVT is greater 
than 1% (see negative predictive values in Table 18-12). 
When combined with a negative D-dimer result, diagnostic 
imaging and anticoagulant therapy can be safely withheld for 
patients with a low clinical probability estimate because the 
LR- (0.20; 95% Cl, 0.12-0.31) is such that the probability 
after testing for DVT is less than 1%. 

High-Sensitivity D-dimer Assays 

The sensitivity, specificity, negative predictive values, LRs+, 
LRs-, and their respective 95% CIs for the studies that used 
high-sensitivity D-dimer assays are demonstrated in Table 
18-12. When combined with a negative D-dimer result, diag¬ 
nostic imaging and anticoagulant therapy can be safely with¬ 
held in patients with a low (LR, 0.10; 95% Cl, 0.03-0.37) or 
moderate clinical probability estimate (LR, 0.05; 95% Cl, 
0.01-0.21) because they create a probability estimate after 
testing for DVT of less than 1%. With a high clinical proba¬ 
bility estimate, a normal D-dimer result does not have an LR 
low enough so that the probability of DVT becomes less than 


1%. These results suggest pooling D-dimer data may not be 
appropriate. Table 18-13 demonstrates the probabilities after 
testing for the different clinical probability estimates accord¬ 
ing to the D-dimer results and includes an explanation of the 
application of Bayes theorem. Assessing the clinical effect of 
different sensitivity D-dimer assays on venous thromboem¬ 
bolic outcomes requires assumptions about the proportions 
of patients in each clinical probability category because they 
have not been compared head to head. This type of assess¬ 
ment is best performed by a formal decision analysis in 
which D-dimer assay accuracies and DVT prevalence are var¬ 
ied, and this is beyond the scope of this article. Comparative 
studies are required to provide more definitive conclusions. 

Is Serial Ultrasonography Needed? 

Should a negative D-dimer result after a normal ultrasono¬ 
graphic result suggest a need for serial ultrasonography? 

Five studies reported sufficient data to enable the determi¬ 
nation of the LR for a negative D-dimer result when the clin¬ 
ical probability estimate was moderate or high and the initial 
ultrasonographic result was normal (data not shown). 9,13 ‘ 15,34 
Two studies used a high-sensitivity D-dimer. 9,14 Because the 
probability of DVT after an initially negative ultrasono¬ 
graphic result is low, the LR for a negative D-dimer result 
ranges from 0.22 to 0.45 and results in a probability of DVT 
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of less then 1% after testing. Thus, regardless of the clinical 
probability estimate, a negative D-dimer result using a mod¬ 
erately sensitive D-dimer assay combined with a negative ini¬ 
tial ultrasonographic result safely obviates the need for serial 
ultrasonography. However, caution must be used when per¬ 
forming D-dimer testing in patients with prolonged symp¬ 
toms of suspected DVT or after a prolonged period of 
heparin therapy (>24 hours). 38 

THE BOTTOM LINE 

Outpatients presenting with suspected DVT should be ini¬ 
tially assessed with a validated clinical prediction rule. The 
clinical prediction published by Wells et al 13 has been assessed 
and validated in multiple clinical studies and can accurately 
categorize outpatients as having low, moderate, or high clini¬ 
cal probability. With this model, less than 5% of outpatients 
classified as low clinical probability have DVT. No other pre¬ 
diction tools met our eligibility criteria. A recent study sug¬ 
gests the prediction rule may not work in the primary care 


setting, but limitations in the design of that study (in partic¬ 
ular, failure to prospectively apply the rule as the diagnostic 
strategy) necessitate further research in primary care. 40 Vali¬ 
dation studies of the model are required for hospitalized 
patients. 

Incorporating D-dimer testing into a diagnostic algorithm 
further simplifies the management of a patient’s case when 
he or she presents with suspected DVT. Once the clinical 
probability has been estimated, the D-dimer result can be 
combined to determine whether DVT can be safely ruled out 
without use of diagnostic imaging. Currently, the diagnosis 
of DVT can be ruled out without the need for ultrasonogra¬ 
phy by using a combination of low clinical probability esti¬ 
mate and a negative D-dimer result, and this strategy should 
apply to as many as 40% of patients referred with suspected 
DVT. Ultrasonography may provide information helpful to 
establish an alternative diagnosis, but ultrasonographic 
imaging for DVT is not required for every patient. Although 
the data are more limited, it seems likely that serial testing 
after an initially normal ultrasonographic result can be con- 


Table 18-12 Accuracy Measures in the High-Sensitivity D-dimer Studies 

Clinical 

Probability 


Before Testing 

Study 

Sensitivity, % 

Specificity, % 

NPV, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Low 

Bates et al, 9 2003 

97 

69 

100 

3.3 (2.7-3.9) 

0.04 (0-0.65) 


Schutgens et al, 14 2003 

96 

51 

99 

2.0(17-2.4) 

0.07(0.01-0.51) 


Bucek et al, 31 2002 

83 

53 

99 

2.1 (1.7-2.6) 

0.32 (0.03-4.0) 


Weighted average (95% Cl) 

95 (82-99) 

58(45-71) 

99(97-100) 

2.4 (1.7-3.3) 

0.10(0.03-0.37) 

Moderate 

Bates et al, 9 2003 

94 

52 

99 

2.0(1.6-2.4) 

0.11 (0.02-0.76) 


Schutgens et al, 14 2003 

100 

40 

99 

17(1.5-1.9) 

0.01 (0-0.16) 


Aguilar et al, 30 2002 

98 

32 

99 

1.5(1.3-17) 

0.06 (0-0.85) 


Weighted average (95% Cl) 

98(91-100) 

41 (31-52) 

99(96-100) 

1.7 (1.5-1.9) 

0.05 (0.01-0.21) 

High 

Bates et al, 9 2003 

98 

40 

98 

17(1.3-2.1) 

0.06 (0-0.85) 


Schutgens et al, 14 2003 

98 

34 

90 

1.5(1.3-17) 

0.07 (0.03-0.20) 


Weighted average (95% Cl) 

97 (94-99) 

36 (29-43) 

92(81-97) 

1.5(1.4-17) 

0.07 (0.03-0.18) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NPV, negative predictive value. 


Table 18-13 Probabilities by Clinical Probability Estimate Combined With D-dimer Assays After Testing 3 




Clinical Probability Estimate 6 



Low 

Moderate 

High 

Point estimate for DVT likelihood 

5 

17 

53 

Probability for positive D-dimer result after testing (high sensitivity) 

11 

25 

63 

Probability for negative D-dimer result after testing (high sensitivity) 

0.5 

1 

8.6 

Probability for positive D-dimer result after testing (moderate sensitivity) 

17 

34 

67 

Probability for negative D-dimer result after testing (moderate sensitivity) 

0.9 

4.4 

19 

Abbreviation: DVT, deep vein thrombosis. 

“Probability after testing from application of Bayes theorem. 

“Posttest odds = pretest odds x likelihood ratio; pretest odds derived from pretest probability as follows: pretest odds = pretest probability/(1 - pretest probability). Similarly, posttest 
probability derived from posttest odds by posttest odds/(1 + posttest odds). For example, using a negative result with a hightsensitivity D-dimer if patient is low pretest probability, then 


pretest odds = 0.05/.95 = 0.052. Next, posttest odds = 0.052 x 0.1 (from Table 184 2) = 0.0052. Convert to posttest probability by 0.052/1.0052 = 0.052, or 0.5%. 
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fined to high-probability patients with positive D-dimer 
results. Patients with moderate probability and a negative 
high-sensitivity D-dimer result can have DVT ruled out. 

Among patients with high clinical probability estimates, a 
normal D-dimer result does not have a sufficiently low LR. 
Therefore, all high-probability patients require diagnostic 
imaging to safely rule out DVT. Thus, D-dimer assays 
should not affect initial treatment for patients with a high 
probability of a DVT, because all of them require diagnostic 
imaging. 

The specificity of D-dimer assays decreases as the clinical 
probability estimate increases, which leads to more false¬ 
positive test results, thereby limiting its utility. This empha¬ 
sizes that D-dimer should not be used as a screening test, and 
indeed some advocate that D-dimer assays should not be used 
for patients at high risk for a false-positive result, ie, elderly 
patients, patients with cancer, and hospitalized patients. 


CLINICAL SCENARIO—RESOLUTION 


The clinician has already determined that the patient has a 
low pretest probability for DVT. The D-dimer result is 
now determined to be negative and therefore the proba¬ 
bility of DVT after testing is sufficiently low (<1%) that 
the diagnosis can be safely ruled out. If the D-dimer result 
had been positive, the patient would require ultrasono¬ 
graphic imaging. In patients with low pretest probability, 
a normal ultrasonographic result reliably rules out clini¬ 
cally important DVT without the need for follow-up 
ultrasonography (probability after testing < 1%). If the 
ultrasonographic result is abnormal, it is usually consid¬ 
ered predictive of DVT, although the probability after test¬ 
ing may be as low as 90%. Therefore, consideration should 
be given that it may be a false-positive result. Small, iso¬ 
lated, single-vein, nonocclusive ultrasonographic results 
have been reported to be falsely positive, mostly because 
they represent chronic DVT. 39 
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CLINICAL SCENARIO 


A 60-year-old man referred with suspected deep vein 
thrombosis (DVT) cut the plantar surface of his left foot 
on glass 10 days ago and has been resting in bed. He pre¬ 
sents with left leg pain and mild calf swelling, redness, and 
heat. There is no history of a DVT or known family his¬ 
tory of venous thromboembolism. Physical examination 
shows the patient is febrile and has pitting edema of the 
left calf. The calf erythema is hot, tender, and well demar¬ 
cated. Enlarged left inguinal lymph nodes are present. He 
has longstanding diabetes mellitus, and the diagnoses that 
seem most likely are cellulitis and DVT. Can a clinical 
probability estimate of DVT reliably determine a pretest 
probability that can be used in decision making? 

UPDATED SUMMARY ON DEEP VEIN THROMBOSIS 

Original Review 

Anand SS, Wells PS, Hunt D, et al. Does this patient have 
deep vein thrombosis? JAMA. 1998;279(14):1094-1099. 

UPDATED REVIEW 

Wells PS, Owen C, Doucette S, et al, eds. Does this patient 
have deep vein thrombosis? JAMA. 2006;295(2):199-207. 

The Update was prepared within 12 months of publication 
of The Rational Clinical Examination article so the “Make the 
Diagnosis” section summarizes the findings published in the 
original review. 


IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

After the original publication, clinical prediction models 
were studied extensively and validated. The updated review 
provides evidence supporting the role of clinical prediction 
models for DVT. 


CLINICAL SCENARIO—RESOLUTION 


The clinician has already determined that the patient has 
a low pretest probability for DVT. The D-dimer result is 
now determined to be negative and therefore the proba¬ 
bility of DVT after testing is sufficiently low (<1%) that 
the diagnosis can be safely ruled out. If the D-dimer 
result had been positive, the patient would require ultra¬ 
sonographic imaging. In patients with low pretest proba¬ 
bility, a normal ultrasonographic result reliably rules out 
clinically important DVT without the need for follow-up 
ultrasonography (probability after testing < 1%). If the 
ultrasonographic result is abnormal, it is usually consid¬ 
ered predictive of DVT, although the probability after 
testing may be as low as 90%. Therefore, consideration 
should be given that it may be a false-positive result. 
Small, isolated, single-vein, nonocclusive ultrasono¬ 
graphic results have been reported to be falsely positive, 
mostly because they represent chronic DVT. 1 
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DEEP VEIN THROMBOSIS—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

A validated clinical prediction rule, applied to the appropri¬ 
ate patient population, creates stratified probability esti¬ 
mates of DVT (see Table 18-14). 


Table 18-14 Simplified Wells Prediction Rule 2 


Clinical Variable 

Score 

Active cancer (treatment ongoing or within previous 

6 mo or palliative) 

1 

Paralysis, paresis, or recent plaster immobilization 
of the lower extremities 

1 

Recently bedridden for 3 d or more, or major sur¬ 
gery within the previous 12 wk requiring general or 
regional anesthesia 

1 

Localized tenderness along the distribution of the 
deep venous system 

1 

Entire leg swelling 

1 

Calf swelling at least 3 cm larger than the asymptom¬ 
atic leg (measured 10 cm below the tibial tuberosity) 3 

1 

Pitting edema confined to the symptomatic leg 

1 

Collateral superficial veins (nonvaricose) 

1 

Previously documented DVT 

1 

Alternative diagnosis at least as likely as DVT 

-2 

Simplified Score = Sum of Clinical Variables 

Probability of DVT, 

% (95% Cl) 

Score > 3, high probability 

53 (44-61) 

Score = 1 to 2, moderate probability 

17(13-23) 

Score < 0, low probability 

5.0 (4-8) 


Abbreviations: Cl, confidence interval; DVT, deep vein thrombosis. 

“In patients with symptoms in both legs, the more symptomatic leg was used. 

POPULATION FOR WHOM DEEP VEIN 
THROMBOSIS SHOULD BE CONSIDERED 

Deep vein thrombosis should be considered in patients with 
an acutely swollen leg that is causing discomfort, even though 
it can be bilateral and occur without prominent discomfort. 


DETECTING THE LIKELIHOOD 
OF DEEP VEIN THROMBOSIS 

Because the prediction rule has been validated for the pretest 
probability and because the likelihood ratio (LR) varies 
according to the probability estimates and D-dimer assay, it is 
easier to display the posttest probability estimates without the 
LRs. Clinicians must know whether their laboratory uses the 
high-sensitivity D-dimer assay or the moderate-sensitivity 
assay. The clinical probability estimates must be determined 
before the D-dimer result is revealed to the clinician. Of all 
the findings, a negative high sensitivity D-dimer result has 
the biggest effect on the probability of disease and for many 
patients will provide evidence that obviates the need for 
imaging (see Table 18-15). 


Table 18-15 Probability of Deep Vein Thrombosis After First Determining 
the Clinical Probability and Then Obtaining the D-dimer Result 


Clinical 


Probability of DVT After Applying D-dimer 
Result to the Clinical Probability Estimate, % 

Probability 

Estimates 3 


High 

Probability 

Moderate 

Probability 

Low 

Probability 

High- 

sensitivity 

D-dimer 

Positive 

63 

25 

11 

Negative 

8.6 

1 

0.5 

Moderate- 

sensitivity 

Positive 

67 

34 

17 

D-dimer 

Negative 

19 

4.4 

0.9 


Abbreviation: DVT, deep vein thrombosis. 

“Values in the table use the exact summary pretest probability estimates, but a clini¬ 
cian might simplify by remembering that a high probability is about 50%; moderate 
probability, 20%; and low probability, 5%. 

REFERENCE STANDARD TESTS 

Imaging studies. 


REFERENCE FOR THE UPDATE 2. Wells P, Anderson DR, Bormanis J, et al. Value of assessment of pretest 

probability of deep-vein thrombosis in clinical management. Lancet. 
1. Wells PS, Hirsh J, Anderson DR, et al. Comparison of the accuracy of 1997;350(9094): 1795-1798. 
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Mr P is a 52-year-old small-business owner with a 5-year 
history of controlled hypertension, for which he takes a 
thiazide diuretic. Otherwise, he is in good health. He pre¬ 
sents for routine follow-up and notes a 1-month history 
of mild to moderate bitemporal headaches and feeling 
fatigued. The headaches occur about twice a week and are 
relieved by acetaminophen. He denies chest pain or dysp¬ 
nea on exertion. He notes wryly that the “new economy” 
has left him feeling a bit “frazzled.” 

You wonder whether the headache and fatigue are stress 
related, a somatic presentation of depression. What is the 
most effective and efficient method for diagnosing depres¬ 
sion? How does one distinguish between somatic symp¬ 
toms related to depression vs those related to coexisting 
physical illness? 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Depressive disorders are prevalent, cause marked personal 
distress, and are associated with increased mortality. In pri¬ 
mary care settings, the prevalence of major depression ranges 
from 4.8% to 8.6%, and dysthymia ranges from 2.1% to 
3.7%.' The World Health Organization estimates that major 
depression alone is the fourth leading cause of disability 
worldwide. 2 Antidepressants and depression-specific psycho¬ 
logical treatments are clearly effective for depression, 
improving both depressive symptoms and functional sta¬ 
tus. 3,4 Many patients can be treated effectively in primary care 
settings. Quality improvement initiatives 5 and disease man¬ 
agement models 610 are cost-effective compared with usual 
care and improve patient outcomes in primary care settings. 
Until effective prevention strategies are developed, high- 
quality depression care begins with recognition and accurate 
diagnosis. This evidence-based review will discuss case-finding 
and clinical interview strategies for depression diagnosis. 

DEFINING CLINICAL DEPRESSION 

Clinical depression is a syndromal diagnosis based on patient 
medical history and the exclusion of competing diagnoses. 
Depressive symptoms are evaluated along several continu¬ 
ums: intensity, duration, and influence on daily functioning. 
With these elements, symptoms can range from low mood 
lasting hours or a few days to major depression, character¬ 
ized by multiple symptoms and substantial effect on daily 
functioning, according to criteria from the Diagnostic and 
Statistical Manual of Mental Disorders, (Fourth Edition) 
( DSM-IV) (Table 19-1). 11 A diagnostic nomenclature that 
helps guide treatment is “major depression,” “dysthymia,” 
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Table 19-1 Diagnostic Criteria and Questions to Assess Major Depression 3 


Symptom 

Z2SM-/I/ Diagnostic Criteria 

Suggested Questions 

Depressed mood 

Depressed mood most of the day, nearly every day 

How has your mood been lately? OR Do you ever 
feel down, depressed, or blue? How often does 
that happen? How long does it last? 

Anhedonia 

Markedly diminished interest or pleasure in almost all 
activities most of the day, nearly every day 

Have you lost interest in your usual activities? Do 
you get less pleasure in things you used to enjoy? 

Sleep disturbance 

Insomnia or hypersomnia nearly every day 

How have you been sleeping? How does that 
compare with your normal sleep? 

Appetite or weight change 

Substantial change in appetite nearly every day or 
unintentional weight loss or gain (eg, >5% of body 
weight in 1 mo) 

Has there been any change in your appetite or 
weight? 

Decreased energy 

Fatigue or loss of energy nearly every day 

Have you noticed a decrease in your energy level? 

Increased or decreased psychomotor 
activity 

Psychomotor agitation or retardation nearly every day 

Have you been feeling fidgety or had problems sit¬ 
ting still? Have you felt slowed down, like you 
were moving in slow motion or stuck in mud? 

Decreased concentration 

Diminished ability to think or concentrate, or indeci¬ 
siveness nearly every day 

Have you been having trouble concentrating? Is it 
harder to make decisions than before? 

Guilt or feelings of worthlessness 

Feelings of worthlessness or excessive guilt nearly 
every day 

Are you feeling guilty or blaming yourself for 
things? How would you describe yourself to some¬ 
one who had never met you before? 

Suicidal ideation 

Recurrent thoughts of death or suicide 

Have you felt that life is not worth living or that 
you'd be better off dead? Sometimes when a 
person feels down or depressed, they might 
think about dying. Have you been having any 
thoughts like that? 


Abbreviation: DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, (Fourth Edition). 

“The diagnosis of major depression requires 5 or more symptoms, including depressed mood or anhedonia, which have been present during the same 2-week period and cause 
clinically significant distress or impairment in social, occupational, or other important areas of functioning. 

Adapted from the DSM-IV." 


Table 19-2 Diagnostic Categories for Depressive Disorders 

Diagnostic 

Category 

DSM-/1/ Criteria 

Symptom 

Duration 

Major depression 

>5 Depressive symptoms, including 
depressed mood or anhedonia, 
causing significant impairment in 
social, occupational, or other impor¬ 
tant areas of functioning 

>2 wk 

Minor depression 3 

2-4 Depressive symptoms, 
including depressed mood or 
anhedonia, causing significant 
impairment in social, occupa¬ 
tional, or other important areas 
of functioning 

>2 wk 

Dysthymia 

3 Or 4 dysthymic symptoms, b 
causing significant impairment in 
social, occupational, or other 
important areas of functioning 

IV 


Abbreviation: DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, (Fourth 
Edition). 

“Minor depression is included in DSM-IV as a research criteria diagnosis that 
requires further evaluation. 

“Dysthymic symptoms are depressed mood, poor appetite or overeating, insomnia or 
hypersomnia, low energy, low self-esteem, poor concentration or indecisiveness, 
and hopelessness. 


and “depression not otherwise specified.” Major depression is 
defined by depressed mood or loss of interest in nearly all 
activities for at least 2 weeks, accompanied by a minimum of 
3 to 4 (for a total of 5) psychological (eg, decreased concen¬ 
tration) or somatic symptoms (eg, insomnia). 11 Dysthymia is 
characterized by fewer symptoms than major depression 
(<5) and a chronic course lasting at least 2 years (Table 19-2). 
Depression not otherwise specified includes syndromes with¬ 
out a sufficient number of symptoms (<5) or duration (<2 
weeks) to meet major depression criteria. Within this cate¬ 
gory, minor depression, an unofficial diagnosis that has been 
nominated for further study, is an example with an insuffi¬ 
cient number of symptoms. 11,12 

HOW TO EVALUATE PATIENTS 
FOR CLINICAL DEPRESSION 

There are 2 recommended approaches to recognizing and diag¬ 
nosing depression. One approach, endorsed by the US Preven¬ 
tive Services Task Force, is to cue physicians to possible clinical 
depression by asking patients to complete a depression ques¬ 
tionnaire during a routine appointment, an approach known as 
case-finding. 13 Patients who score above a specified threshold are 
evaluated more carefully for depression. A second approach is to 
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evaluate patients for depression only when the clinical presenta¬ 
tion triggers the suspicion of depression. 1 Chronic medical ill¬ 
ness, chronic pain syndromes, recent life changes or stressors, 
fair or poor self-rated health, and unexplained physical symp¬ 
toms are associated with depression. 14 The likelihood of a 
depressive disorder increases by approximately 1.5 to 3.5 times if 
any of these factors is present. 15 

For either approach, a clinical interview is used to make a 
definitive diagnosis, in which the interviewer begins with open- 
ended questions and then proceeds as necessary to narrowly 
focused questions. In patients such as Mr P who present with 
somatic symptoms, a transition is recommended from inquiry 
about these symptoms to questions about emotional health. 
Many experts create useful transitions with questions such as, 
How are things at home? or, How are things at work? More nar¬ 
rowly focused questions should follow (Table 19-1), with prior¬ 
ity given to questions about mood and anhedonia (a loss of 
interest or decreased pleasure in activities) because at least 1 of 
these 2 cardinal symptoms is required to diagnose clinically sig¬ 
nificant depression. Because successive generations use different 
synonyms for depressed mood, several alternatives should be 
offered in the question. For example, it may be helpful to ask, 
Have you been feeling sad, down, depressed, or blue? If answers 
to questions about mood and anhedonia are no, clinically signif¬ 
icant depression is unlikely and alternative diagnoses should be 
considered more strongly. 

Patients admitting to either depressed mood or anhedonia 
should be asked additional questions to determine whether 
there are sufficient symptoms to warrant a diagnosis of clinical 
depression. Assessing the effect of depressive symptoms on 
functioning and suicide risk are critical elements in the initial 
treatment decision. A helpful question to assess functioning is, 
Have these symptoms of [fill in patient’s symptoms] affected 
your home or work life? Suicide assessment is more complex. 
Because patients rarely volunteer thoughts of suicide or their 
intentions to their physicians, it is important to ask directly. 
There is no evidence to suggest that asking about suicide precip¬ 
itates suicidal thinking or acts. 16 One useful screening question 
is, Have you been feeling that life is not worth living or that you 
would be better off dead? 17 Another approach is to say, “Some¬ 
times when a person feels down or depressed, they might think 
about dying. Have you been having any thoughts like that?” For 
patients with suicidal ideation, the next step is to ask, “Do you 
have a plan?” If a patient answers yes, inquire about the plan and 
determine whether he or she has assembled the materials 
required, has set a time, and whether there are any factors that 
may precipitate or keep the patient from carrying out the plan. 
Major risk factors for suicide include hopelessness, substance 
abuse, and previous suicide attempts. Patients at high risk of sui¬ 
cide should be referred for psychiatric evaluation; those at 
imminent risk should be evaluated immediately. 18 

Expert guidelines recommend a careful review of systems to 
detect general medical conditions that may masquerade as 
depression or complicate its treatment. 1 Physical conditions, 
such as hypothyroidism or Cushing disease, may cause depres¬ 
sion, and some experts recommend a thyrotropin measurement 
in women older than 50 years because of the increased preva¬ 
lence of hypothyrodism. 1 ’ 19 Because these physical conditions are 


etiologic, treatment is directed at the underlying condition 
rather than the depressive symptoms. Similarly, medication such 
as glucocorticoids, anabolic steroids, and high-dose reserpine or 
withdrawl from cocaine or amphetamine can cause depres¬ 
sion. 20-22 Other medical conditions such as malignancies, diabe¬ 
tes mellitus, autoimmune disorders, and coronary heart disease 
are highly associated but not causative for depression, and treat¬ 
ment is directed simultaneously at the clinical depression and 
the associated physical illness. 1,4,23-26 Diagnostic testing for these 
disorders is indicated only when clinical symptoms suggest the 
condition. For example, patients with weight loss out of propor¬ 
tion to the depression should be evaluated for malignancy or 
other systemic disorders associated with weight loss. Psychiatric 
illnesses such as alcohol abuse are common in primary care set¬ 
tings and often co-occur with depression. 27 The combination is 
difficult to treat, often requiring mental health specialty care. 
The CAGE questions (Have you ever felt the need to cut down 
on your drinking? Have you ever felt annoyed by criticism of 
your drinking? Have you ever felt guilty about your drinking? 
Have you ever taken a drink [eye opener] first thing in the morn¬ 
ing?) are a pragmatic and effective screen for alcohol abuse. 28 

Once depression is diagnosed, additional history should be 
elicited about factors that may affect treatment. First, explore the 
patient’s understanding and acceptance of the diagnosis. Stig¬ 
matizing beliefs about depression or outright rejection of the 
diagnosis may interfere with treatment adherence. Second, elicit 
the patient’s treatment preferences and information on response 
to therapy for previous episodes of depression. This is particu¬ 
larly important for pharmacotherapy because antidepressant 
agents that have been used successfully for past depressive epi¬ 
sodes are likely to be effective and well tolerated for the current 
episode. 4 Finally, assess the number of previous episodes. The 
risk of relapse, and hence the need for longer-term treatment, 
increases with the number of previous episodes. 1,4 

CRITERION STANDARD DIAGNOSIS 

Clinical depression is a syndromal diagnosis. There is no 
physiologic or laboratory test, radiologic examination, or tis¬ 
sue diagnosis to definitively establish the diagnosis. Instead, a 
trained interviewer conducts a clinical interview to deter¬ 
mine whether the patient meets established criteria. The 
most commonly used criteria, which are updated periodi¬ 
cally, are the DSM-IV or the International Classification of 
Diseases, Tenth Revision, Clinical Modification . I1,29 

METHODS 

Search Strategy and Inclusion/Exclusion Criteria 

We conducted separate searches of MEDLINE and a special¬ 
ized registry of depression trials 30 for English-language medi¬ 
cal literature published from 1970 through July 2000 for 
studies evaluating the performance of case-finding instru¬ 
ments in primary care settings and the reliability of the clini¬ 
cal interview. All searches included the terms “depressive 
disorder” or “depression” and additional terms as appropri¬ 
ate for the specific search. Unpublished data were not sought. 
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For case-finding, we modified inclusion criteria used in our 
previous literature synthesis 31 to select instruments that are 
most readily used in clinical situations. Studies were included 
if they were conducted in a primary care setting, administered 
a case-finding instrument, and used a standard interview such 
as the Structured Clinical Interview for Diagnostic and Statisti¬ 
cal Manual of Mental Disorders, Third Edition Revised (SCID, 
DSM-III-Rf 2 to make a criterion-based diagnosis (eg, DSM- 
III, DSM-III-R, DSM-IV) of depression. Furthermore, we 
specified that the case-finding instrument have easy to average 
literacy requirements, 33 be scored without a calculator, have a 
depression-specific component, and be evaluated in at least 1 
study with at least 100 subjects. Of 1766 articles identified by 
the search strategy, 379 potentially eligible studies were reviewed. 
Twenty-eight studies, involving 11 case-finding instruments, 
met all inclusion criteria. 34-61 

For reliability studies, we required criterion-based diagnoses 
made by 2 or more clinicians who interviewed the same patient 
or reviewed an audiotape or videotape interview. Clinicians 
evaluated patients with known or suspected psychiatric illness 
who were recruited from inpatient or outpatient settings in both 
mental health and general medical settings. Studies using non¬ 
clinician interviewers were excluded. Among studies using semi- 
structured interviews, we only included those using the SCID, a 
commonly used research instrument for diagnosing psychiatric 
illness. The search yielded 6103 potentially eligible articles, of 
which 14 met all inclusion/exclusion criteria. 62-75 

Data Abstraction and Statistical Methods 

Two independent reviewers abstracted articles. For case¬ 
finding studies, quality assessment addressed sample size 


Box 19-1 Web Sites for Case-Finding Instruments 8 

Beck Depression Inventory (BDI): 

http://en.wikipedia.org/wiki/Beck_Depression_Inventory 
Center for Epidemiologic Studies Depression (CES-D): 
http://www.chcr.brown.edu/pcoc/cesdscale.pdf 
Duke Anxiety-Depression Scale (DADS): 
http://healthmeasures.mc.duke.edu/ 

Geriatric Depression Scale (GDS)—Long or Short Versions: 
http://www.stanford.edu/~yesavage/GDS.html 

Primary Care Evaluation of Mental Disorders 
(PRIME-MD) 48 : 

http://jama.ama-assn.org/cgi/content/full/282/18/1737 
PRIME-MD Patient Health Questionnaire (PHQ): 
http: //www.phqscreeners .com/ 

The Zung Self-Rating Depression Scale (SDS): 

http://healthnet.umassmed.edu/mhealth/ 

ZungSelfRatedDepressionScale.pdf 

"All Web sites accessed May 28, 2008. 


greater than 100, whether patients were selected consecu¬ 
tively or randomly, whether the criterion standard was 
administered and interpreted independently of and blind to 
the results of the case-finding instrument, and whether the 
proportion of persons receiving the criterion standard 
assessment was less than or more than 50% of those 
approached for criterion standard assessment. For reliability 
studies, quality assessments addressed whether key patient 
characteristics were described (eg, depression severity), 
whether the interviewers collected clinical history indepen¬ 
dently, and whether diagnoses were made blinded to other 
clinicians’ evaluations. 

Established cut points for case-finding instruments were 
used except for short versions of original instruments that 
had proportionally lower thresholds 35,43,46,60 and one study 
that used a higher threshold than originally recommended. 44 
Two-by-two tables were used to categorize the number of 
persons whose results were positive and negative and who 
did and did not meet criterion standard diagnosis for major 
depression. When it was appropriate, we adjusted for verifi¬ 
cation bias. 76 Of 11 authors contacted for additional informa¬ 
tion, 10 responded with the needed data. The average 
positive likelihood ratio (LR+) and negative likelihood ratio 
(LR-), weighted by study precision and corrected for 2-stage 
assessment techniques when indicated, were computed for 
each case-finding instrument. 77-79 A scattergram plotting 
true-positive against false-positive rates was constructed to 
visually evaluate variability among studies. To provide a 
visual reference for the consistency of study results, we mod¬ 
eled a summary receiver operating characteristic curve based 
on the logit transformation of the true-positive and false¬ 
positive rates. The effectiveness score was used to evaluate 
overall performance and study heterogeneity. 80 Studies of 
reliability were not combined quantitatively because of 
marked heterogeneity in study design. 

RESULTS 

Accuracy of Case-Finding 
Questionnaires for Depression 

Eleven questionnaires, ranging from 1 to 30 items, met all 
inclusion criteria (Table 19-3). Six are depression-specific 
(Beck Depression Inventory [BDI], Center for Epidemiologic 
Studies Depression Screen [CES-D], Depression Scale 
[DEPS], Geriatric Depression Scale [GDS], Zung Self-Rating 
Depression Scale [SDS], and Single Question [SQ]), 1 
addresses depression and anxiety (Duke Anxiety and Depres¬ 
sion Scale [DADS]), and 4 are multicomponent (Hopkins 
Symptom Check List [HSCL], Primary Care Evaluation of 
Mental Disorders [PRIME-MD], PRIME-MD Patient Health 
Questionnaire [PHQ], and Symptom Driven Diagnostic Sys¬ 
tem for Primary Care [SDDS-PC]). All of the questionnaires 
can be self-administered in less than 5 minutes, all include 
specific questions aimed at assessing depressed mood, and, 
except for the SQ instrument, all assess anhedonia. Resources 
to obtain the full instruments are listed in Box 19-1. 
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Table 19-3 Characteristics of Depression Case-Finding Instruments Validated in Primary Care Settings 




Instrument 

No. of 
Items 3 

Scope 

Response Format 

Period of 
Questions 

Score 

Range 

Usual Cut 
Point b 

Literacy 

Level' 

Administration 
Time, min 

Monitor 
Severity or 
Response 

BDI 

21,13,7 

Depression- 
specific (multi¬ 
ple versions) 

4 Statements of 
symptom severity 
per item 

Today 

0-63 

10-19 Mild, 20- 
29 moderate, 
>30 severe 

Easy 

2-5 

Yes 

CES-D 

20,10 

Depression- 
specific (2 ver¬ 
sions) 

4 Frequency ratings: 
“less than 1 d” to 
“most or all (5-7 d)” 

Past week 

0-60 

>16 

Easy 

2-5 

Yes 

DEPS 

10 

Depression- 

specific 

4 Frequency rat¬ 
ings: “not at all” to 
“extremely” 

Last month 

0-30 

>9 

Average 

<2 

Unknown 

DADS 

7 

For anxiety and 
depression 

3 Frequency rat¬ 
ings: “yes, some¬ 
what, no”for 3 
items; “none, some, 
a lot” for 4 items 

Past week 

0-100 

>30 

Average 

<2 

Unknown 

GDS 

30,15 

Depression- 
specific (2 ver¬ 
sions) 

Yes or no 

Past week 

0-30 

>11 

Easy 

2-5 

Yes 

HSCL 

25,13 

Multiple ver¬ 
sions and multi¬ 
ple components 
with depression 
category 

4 Frequency rat¬ 
ings: “not at all” to 
“much more than 
usual” 

Past week 

25-100 

>43 

Average 

2-5 

Yes 

PRIME-MD 

2 

Multiple com¬ 
ponents with 
depression 

Yes or no 

Past month 

0-2 

>1 

Average 

<1 

No 

PRIME-MD 

(PHQ) 

9 

Multiple com¬ 
ponents with 
depression 

4 Frequency rat¬ 
ings: “not at all” to 
“nearly every day” 

Past 2 wk 

0-9 For 
diagnosis; 
0-27 for 
response 

For diagnosis: 5 
symptoms. For 
severity: 0-4 
none; 5-9 mild; 
10-14 moderate; 
15-19 major; 
20-27 severe 

Average 

<2 

Yes 

SDDS-PC 

5 

Multiple com¬ 
ponents with 
depression 

Yes or no 

Past month 

0-5 

>2 

Easy 

<2 

Unknown 

SDS 

20 

Depression- 

specific 

4 Frequency rat¬ 
ings: “little of the 
time” to “most of the 
time” 

Recently 

25-110 
(Sum/80 
x 100) 

50-59 Mild, 60- 
69 moderate, 
>70 severe 

Easy 

2-5 

Yes 

SQ 

1 

Depression- 

specific 

Yes or no 

Past year 

0-1 

1 

Easy 

<1 

No 


Abbreviations: BDI, Beck Depression Inventory; CES-D, Center tor Epidemiologic Studies Depression Screen; DADS, Duke Anxiety and Depression Scale; DEPS, Depression 
Scale; GDS, Geriatric Depression Scale; HSCL, Hopkins Symptom Check List; PRIME-MD, Primary Care Evaluation of Mental Disorders; PRIME-MD (PHQ), PRIME-MD Patient 
Health Questionnaire; SDDS-PC, Symptom-Driven Diagnostic System for Primary Care; SDS, Zung Self-Rating Depression Scale; SQ, single question. 

“Numbers refer to different versions of the same instrument and are listed from most to least number of items. Item numbers for the DADS, PRIME-MD, PRIME-MD (PHQ), and 
SDDS-PC refer to depression questions only; item numbers for the HSCL refer to depression plus anxiety questions. 
b Cut point is given for the instrument version with the highest number of items. 

'Easy indicates third- to fifth-grade reading level; average, sixth- to ninth-grade reading level. 33 


The BDI, the CES-D, and the SDS were developed specifi¬ 
cally to identify depression. They include similar numbers of 
questions and use response formats that rely either on ranking 
symptom severity or on classifying frequency of symptoms. 
These 3 instruments are among the most thoroughly evaluated 
in primary care and can be used to rate the severity of depres¬ 
sion and monitor response to therapy. Shortened versions of 


the BDI and the CES-D have been tested recently in primary 
care. 35,43 The GDS exists in both 30- and 15-item versions and 
uses a yes-or-no response format that simplifies telephone 
administration. It has been evaluated only in populations aged 
60 years and older. DADS, DEPS, and SQ (Have you felt 
depressed or sad much of the time in the past year?) are newer, 
brief instruments that have been evaluated in single studies. 
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The PRIME-MD and SDDS-PC instruments are multidimen¬ 
sional questionnaires. Each has screening questions arranged in 
several categories (eg, depression, anxiety, alcohol abuse) that are 
used to trigger more extensive diagnostic interviewing sections 
for specific Diagnostic and Statistical Manual (DSM) diagnoses. 
Recently, the PHQ, a completely self-administered version of 
the PRIME-MD, has been evaluated. It scores each DSM-IV 
depression symptom as present or absent to diagnose depres¬ 


sion, and can also be scored continuously to monitor treatment 
response. The E1SCL screens for general psychiatric illness and 
has a specific category for depression. 

These instruments, encompassing 37 evaluations in 28 
published studies studies, 34 ' 61 involved 25550 screened patients, 
of whom 9218 were administered an acceptable criterion 
standard for diagnosing depression (Table 19-4). Nine of 28 
studies had potential major selection biases because more 


Table 19-4 Case-Finding Instrument Performance in Primary Care Settings 




Source, y 

Instrument 

Quality 

Score e 

Sample 

Size 4 

Population 11 

Positive 

Likelihood Ratio 
(95% Cl) c 

Negative 
Likelihood Ratio 
(95% Cl) d 

Major Depressive Disorder 

Whooley et al, 34 1997 

BDI 

1 

536 

Veterans Affairs 

2.5 

0.18 

Steer et al, 35 1999 

BDI, 7 items 

3 

120 

Academic 

97 

0.03 

Perez-Stable et al, 36 1990 

BDI 

4 

265 

Mixed 

1.5 

0.25 

Zich etal, 37 1990 

BDI 

4 

31 

Mixed 

3.4 

0.17 

Summary BDI 





4.2 (1.2-14) 

0.17(0.1-0.3) 

Kirmayer etal, 38 1993 

CES-D 


685 

Academic 

3.0 

0.28 

Whooley etal, 34 1997 

CES-D 

1 

536 

Veterans Affairs 

3.0 

0.10 

Williams et al, 39 1999 

CES-D 

1 

296 

Mixed 

3.2 

0.13 

Fechner-Bates et al, 40 1994 

CES-D 

2 

425 

Community 

2.7 

0.29 

Hendrie et al, 41 1995 

CES-D 

2 

125 

Academic (age > 60 y) 

2.9 

0.60 

Plough et al, 42 1983 

CES-D 

2 

525 

Health maintenance organization 

3.9 

0.23 

Irwin etal, 43 1999 

CES-D, 10 items 

3 

68 

Academic (age > 60 y) 

11 

0.14 

Lyness etal, 44 1997 

CES-D 

4 

130 

Community (age > 60 y) 

12 

0.15 

Perez-Stable et al, 36 1990 

CES-D 

4 

214 

Mixed 

1.9 

0.40 

Zich etal, 37 1990 

CES-D 

4 

34 

Mixed 

1.8 

0.31 

Summary CES-D 





3.3 (2.5-4.4) 

0.24 (0.2-0.3) 

Neal and Baldwin, 45 1994 

GDS 

2 

45 

Academic (age > 65 y) 

4.0 

0.25 

D'Ath et al, 46 1994 

GDS, 15 items 

4 

120 

Community (age > 65 y) 

3.3 

0.12 

Summary GDS 





3.3 (2.4-4.7) 

0.16(0.1-0.3) 

Schmitz etal, 47 1999 

HSCL 

1 

421 

Community 

2.0 

0.37 

Hough etal, 42 1983 

HSCL 

2 

525 

Health maintenance organization 

5.4 

0.17 

Summary HSCL 





3.2(17-6.2) 

0.24(0.1-0.5) 

Spitzer et al, 48 1999 

PHQ 

1 

585 

Mixed 

12(8.4-18) 

0.28 (0.2-0.5) 

Spitzer etal, 49 1994 

PRIME-MD 

1 

431 

Mixed 

3.4 

0.19 

Whooley et al, 34 1997 

PRIME-MD 

1 

536 

Veterans Affairs 

2.2 

0.07 

Summary PHQ or PRIME-MD 





2.7 (2.0-3.7) 

0.14(0.1-0.3) 

Leon et al, 50 1996 

SDDS-PC 

1 

501 

Community 

5.4 

0.34 

Whooley etal, 34 1997 

SDDS-PC 

1 

536 

Veterans Affairs 

2.0 

0.08 

Broadhead etal, 51 1995 f 

SDDS-PC 

3 

388 

Community 

4.0 

0.12 

Broadhead etal, 51 1995 

SDDS-PC 

3 

257 

Mixed 

3.9 

0.40 

Summary SDDS-PC 





3.5 (2.4-5.1) 

0.22(0.1-0.4) 

Spitzer et al, 49 1994 

SDS 

1 

337 

Mixed 

3.3 

0.19 

Okimoto et al, 52 1982 

SDS 

3 

55 

Veterans Affairs (age > 60 y) 

2.2 

0.05 

Magruder-Habib et al, 53 1990 

SDS 

4 

206 

Veterans Affairs 

15 

0.27 

Raftet al, 54 1977 

SDS 

4 

69 

Academic 

1.0 

0.97 

Summary SDS 





3.3(1.3-8.1) 

0.35 (0.2-0.8) 

Williams et al, 39 1999 

SQ 

1 

291 

Mixed 

2.3(1.8-2.9) 

0.16(0.0-0.6) 

Median performance for all instruments 





3.3 

0.19 


(continued) 
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Table 19-4 Case-Finding Instrument Performance in Primary Care Settings ( Continued ) 



Source, y 

Instrument 

Quality 

Score e 

Sample 

Size 2 

Population 5 

Positive 

Likelihood Ratio 
(95% Cl)“ 

Negative 
Likelihood Ratio 
(95% Cl) d 

Major Depressive Disorder or Dysthymia 

Klinkman et al, 55 1997 

CES-D 

2 

425 

Mixed 

2.9 

0.27 

Schulberg et al, 56 1985 

CES-D 

4 

294 

Community 

5.2 

0.26 

Salokangaset al, 57 1995 

DEPS 

2 

436 

Community 

4.9 

0.31 

Parkerson and Broadhead, 58 1997 

DADS 

3 

481 

Academic 

2.3 

0.28 

van Marwijket al, 59 1995 

GDS, 30 items 

1 

586 

Community (age > 65 y) 

3.9 

0.53 

Arthur et al, 60 1999 

GDS, 15 items 

3 

201 

Community (age > 75 y) 

3.4 

0.05 

Nettelbladt et al, 61 1993 

HSCL 

2 

186 

Community 

2.8 

0.36 

Median performance for all instruments 





3.9 

0.30 


Abbreviations: BDI, Beck Depression Inventory; CES-D, Center for Epidemiologic Studies Depression Screen; Cl, confidence interval; DEPS, Depression Scale; DADS, Duke Anxi¬ 
ety and Depression Scale; GDS, Geriatric Depression Scale; HSCL, Hopkins Symptom Check List; PHQ, PRIME-MD Patient Health Questionnaire; PRIME-MD, Primary Care Evalu¬ 
ation of Mental Disorders; SDDS-PC, Symptom Driven Diagnostic System for Primary Care; SDS, Zung Self-Rating Depression Scale; SQ, Single Question. 

“The sample size refers to the actual number who received the criterion standard. 

b Mixed-community and university-affiliated clinics, academic university-affiliated clinics, and community-private practice clinics. 

“The positive likelihood ratio describes how much more likely it is that a positive depression screening result would be observed in an individual with depression than in someone 
without depression. It is calculated as sensitivity/(1 - specificity). Summary is a weighted average across all studies. 

“The negative likelihood ratio describes how much more likely it is that a negative depression screening result would be observed in an individual with depression than in someone 
without depression. It is calculated as (1 - sensitivity)/(specificity). Summary is a weighted average across all studies. 

“Lower scores indicate higher quality. 

The study by Broadhead et al 51 is listed twice for the same instrument because it included both an initial test set of patients and a validation set of patients. 


than 50% of persons selected did not receive a criterion stan¬ 
dard assessment, either because they refused the assessment 
or failed to keep an appointment. 36 ' 37 ' 41 ' 44 - 51 ’ 53 - 56 - 58 Considering 
independent and blind administration of the criterion stan¬ 
dard, major selection bias, and sample size, 15 (54%) of the 
28 studies were of reasonably high quality for diagnostic test 
evaluations. 

Figure 19-1 plots the study results for true-positive and 
false-positive rates for case-finding instruments used to 
detect major depression. Standard cut points were used for 
these calculations (Table 19-3) except for one study using 
higher than recommended thresholds for the CES-D. 44 The 
cut point for mild depression was used for the 2 scales with 3 
listed cut points (BDI and SDS); the study by Raft et al 54 only 
had information corresponding to moderate depression for 
the SDS. Two studies were identified as outliers. 36,54 The study 
by Raft et al 54 used the higher cut point for the SDS scale and 
had an unusually low sensitivity (31%; 95% confidence inter¬ 
val [Cl], 16%-51%). The study by Perez-Stable et al 36 had an 
unusually low specificity of 40% for the BDI. They studied 
patients with high levels of medical comorbidity and high 
ethnic minority representation, factors that may have 
decreased specificity. 

The median LR+ for all studies was 3.3 (range, 2.3-12), 
meaning that a positive depression screening result is 3.3 times 
more likely to be observed in someone with depression than in 
someone without the illness. The median LR- was 0.19 (range, 
0.14-0.35), meaning that a negative depression screening result 
was about 0.2 times as likely to be observed in someone with 
depression than in someone without the illness. Performance 
did not differ significantly between instruments. With the 


effectiveness score as a measure of overall performance, there 
was statistically significant heterogeneity for the BDI (P < .01), 
CES-D (P < .04), HSCL (P = .04), and SDS (P < .01), indicat¬ 
ing that the instruments performed variably across the indi¬ 
vidual studies. The variability may be due to differences in the 
patient populations or study design. Finally, a subset of studies 
reported instrument performance for major depression and 
separately for the combined category of major depression or 
dysthymia. 55 ’ 61 Performance characteristics for detecting this 
combined category were not statistically significantly different 
from those for detecting major depression alone. 

Given the similar performance, case-finding instruments 
should be selected according to brevity, response format 
(particularly if telephone administration is planned), the 
desire to screen for other psychiatric illnesses, and the need 
to monitor response. The PHQ best meets these criteria with 
only 9 items for depression, modules for other psychiatric 
illness, and a simple response format that is sensitive to 
change. For clinicians who wish to screen only for depres¬ 
sion, the SQ is an attractive alternative that could be asked 
during preventive medicine evaluations or in response to 
triggers that increase the likelihood of depression. Positive 
responses would need to be explored by a more careful clini¬ 
cal interview. In a clinic with an 8% prevalence of major 
depression or dysthymia, a clinician treating 100 patients 
per week can expect that 30 will have a positive screening 
result for depression, of whom 7 would meet criteria for 
clinical depression after a more careful clinical interview. 
Among the 70 patients who have a negative screening result 
for depression, 1 would have clinical depression. If case¬ 
finding were used only in selected high-risk patients (eg, 
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True-Negative Rate (Specificity) 

1.0 0.8 0.6 0.4 



False-Positive Rate (1- Specificity) 

Figure 19-1 True-Positive Rate Against False-Positive Rate for 
Case-Finding Instruments to Detect Major Depression 

The numbered points represent individual studies. The curve represents the 
summary receiver-operating curve through all data points. Standard cut 
points (see Table 19-3) were used for these calculations except for one 
study 44 that used higher than recommended thresholds for CED-S. Abbrevia¬ 
tions: BDI, Beck Depression Inventory; CES-D, Center for Epidemiologic 
Studies Depression Screen; GDS, Geriatric Depression Scale; HSCL, Hopkins 
Symptom Check List; PHQ, PRIME-MD Patient Health Questionnaire; PRIME- 
MD, Primary Care Evaluation of Mental Disorders; SDDS-PC, Symptom 
Driven Diagnostic System for Primary Care; SDS, Zung Self-Rating Depres¬ 
sion Scale; SQ, Single Question. 


those with chronic pain), a positive screening result would 
more likely be a true positive, but more patients with clinical 
depression would be missed. 

Accuracy and Reliability of the 
Clinical Interview for Depression 

Because the criterion standard diagnosis is based on a clinical 
interview, there is no simple method for establishing its accu¬ 
racy. However, we identified relevant studies comparing the 
diagnostic agreement between 2 mental health professionals, 
between primary care physicians and a mental health profes¬ 
sional, and the effects of coexisting medical illness on reliability. 

We identified 7 studies using the SCID, which evaluated inter¬ 
rater reliability for major depression (Table 19-5). 62 ' 68 The SCID 
is a widely used research instrument that uses a semistructured 
interview to elicit symptoms that are applied to the current 
DSM criteria to establish a diagnosis. 32 It is designed in part to 
decrease variability related to the range of symptoms explored 
and the manner in which a clinical interviewer presents ques¬ 
tions. Study design varied considerably, ranging from multiple 
clinicians viewing a videotaped interview to paired interviewers 
conducting sequential interviews. Examiners’ training and expe¬ 
rience ranged from psychology trainees to practicing psychia¬ 


trists with a special interest in mood disorders. All were 
conducted in mental health specialty settings. Diagnoses were 
made blind to the other raters’ diagnosis in 6 studies, patient 
medical histories were elicited independently in only 1 study, 
and no study described depression severity. Despite the variabil¬ 
ity in study design and examiner training, interrater agreement 
corrected for chance was substantial to almost perfect (k = 0.64- 
0.93). These studies show that major depression can be diag¬ 
nosed reliably by a mental health professional who uses a care¬ 
ful, semistructured interview. 

Studies that use nonstandardized interviews to make 
DSM -based diagnoses may better simulate clinical practice. 
Seven studies, involving psychiatry trainees to practicing psychi¬ 
atrists, evaluated interrater agreement with this approach. 69 ' 75 
Diagnoses were based typically on paired interviewers con¬ 
ducting joint or sequential interviews; one study used a vid¬ 
eotaped interview. 70 Diagnoses were made blind to the 
other raters’ diagnosis in 5 studies, patient medical histories 
were elicited independently for most patients in 3 studies, 
and no study described depression severity. Interrater 
agreement corrected for chance was moderate to substantial 
(k = 0.55-0.74). Compared with semistructured interviews, 
agreement was somewhat lower for nonstandardized inter¬ 
views. However, chance-corrected agreement remained 
good compared with many other clinical findings. 81 ' 83 Less is 
known about the reliability of depression diagnoses made by 
primary care physicians. We were able to identify only 1 
study that compared a primary care clinician’s diagnoses 
based on DSM criteria to that of a mental health profes¬ 
sional using the same criteria. Spitzer et al 49 compared pri¬ 
mary care clinicians’ diagnoses using a semistructured 
instrument to mental health professionals’ diagnoses with 
an SCID-based DSM measure of depression. This study 
found good agreement (simple agreement, 88%; K = 0.71). 
It is unknown how well primary care physicians using a 
nonstandardized interview would agree with diagnoses 
made by mental health professionals. 

These studies have a number of design limitations. The 
severity of major depression and spectrum of competing 
medical and psychological illnesses that may make diagnosis 
more difficult were not typically described. In addition, stud¬ 
ies using joint interviews and videotape review may overesti¬ 
mate interrater reliability because both interviewers hear 
identical information. Two of the studies compared diag¬ 
noses made by emergency department psychiatrists to those 
made by the patient’s inpatient treating physician and were 
thus not blinded evaluations, again potentially biasing these 
studies toward higher agreement. 72 ’ 75 Finally, only 1 study 
reported 95% CIs for the estimate of interrater agreement. 68 

Effect of Physical Illness on Diagnosis 

Because the psychological and physical symptoms of depression 
may overlap with other physical illness, diagnosing depression in 
patients with severe or multiple chronic medical illnesses can be 
especially challenging. 84 If symptoms caused by the physical ill¬ 
ness (eg, fatigue related to congestive heart failure) are attributed 
to depression, then patients may receive unnecessary treatment. 
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Table 19-5 Interrater Reliability for Depressive Disorder With Semistructured and Nonstructured Interviews 

No. of 

No. of Patients 
Patients With MDD 

Source, y Examiners (No.) Evaluated Diagnosis Setting 


Semistructured Interview 

Fuhrer et al, 62 1986 

Psychiatrists (136) 

11 

2 

Inpatient 

Videotape review 

No 

Yes 

NA (0.78) 

Riskind et al, 63 1987 

Psychologists (16) 

75 

25 

Outpatient 

Videotape review 

No 

Yes 

82% (0.72) 

Stukenberg et al, 64 

1990 

Psychology trainees (4) 

75 

NA 

Community 

NS 

NS 

NS 

NA (0.92) 

Skreet al, 65 1991 

Psychiatrist (1) 
Psychologists (4) 

54 

25 

Mixed 

Live vs audiotape 
review 

No 

Yes 

NA (0.93) 

Williams et al, 66 1 992 

Psychiatrists (14) 
Psychologists (6) 

Master’s degree (4) 

390 

121 

Mixed 

Live, sequential 
interview 

Yes 

Yes 

NA (0.64) 

Segal et al, 67 1994 

Psychology trainees (NS) 

33 

15 

Outpatient 

Live vs audiotape 
or videotape review 

No 

Yes 

85% (0.70) 

Keller et al, 68 1995 

Master’s degree (NS) 

80 

68 

Mixed 

Live vs videotape 
review 

No 

Yes 

NA (0.72) 

Nonstandardized Interview 

Spitzer et al, 69 1979 

Mental health clinicians 
(274) 

281 

83 

Mixed 

Joint or sequential 
interview 

Mixed 

Yes 

NA (0.70) 

Webb et al, 70 1981 

Mental health clinicians 
(78) 

1 

1 

NA 

Videotape review 

No 

Yes 

83% (NA) 

Hyler et al, 71 1982 

Psychiatrists (31) 
Psychologists (3) 

Social workers (7) 

46 

14“ 

Mixed 

Joint or sequential 
interview 

Mixed 

Yes 

NA (0.55) 

Lieberman and Baker, 72 
1985 

Psychiatrists (NS) 

50 

6 

Emergency 

department 

Sequential inter¬ 
view 

NS 

No 

NA (0.62) 

Mellsop et al, 73 1991 

Psychiatrist (5) 

60 

32“ 

Inpatient 

Joint interview 

No 

Yes 

NA (0.70) 

Buchwald and Rudick- 
Davis, 74 1993 

Psychiatry residents (25) 
Psychology trainee (1) 

43 

38 

Emergency 

department 

Joint or sequential 
interview 

Mixed 

Yes 

88% (0.74) 

Warner and Peabody, 75 
1995 

Psychiatry residents (30) 
Psychiatrists (6) 

190 

74 

Emergency 

department 

Sequential interview 

NS 

No 

NA (0.64) 


Abbreviations: MDD, major depressive disorder; NA, not available (not reported by authors); NS, not stated. 

“Yes indicates history obtained independently by 2 or more observers; mixed, history obtained independently for some but not all subjects. 
b Yes indicates diagnosis made without knowledge of other examiners’ diagnosis. 

“Patients had affective disorder rather than the more specific MDD. 

Conversely, if depressive symptoms are misattributed to a con¬ 
current physical illness, then effective depression treatment may 
be withheld. A number of strategies have been proposed in an 
attempt to improve the accuracy and reliability of diagnosis in 
physically ill patients. The “inclusive” approach counts depres¬ 
sive symptoms toward the diagnosis of depression, regardless of 
whether the clinician judges that the symptom is due to medical 
or psychological causes. The DSM-IV criteria use an “etiologic” 
approach that counts symptoms toward a major depression 
diagnosis unless the symptom is “clearly and fully accounted for 
by a general medical condition.” 11 Because clinicians must make 
a judgment about the cause of individual symptoms, this 
approach may be less reliable than the inclusive approach. A 
third strategy, called the “substitutive” approach, replaces 
depression criterion symptoms that are most likely to be con¬ 
fused with medical illness (eg, loss of energy, weight loss, 


impaired concentration) with psychological symptoms. 85 This 
approach was developed in an attempt to better discriminate 
between patients with depression and physical illness and those 
with only physical illness. Koenig et al 86 evaluated these strategies 
in a consecutive series of elderly, hospitalized patients. The prev¬ 
alence of major depression was 21% using the inclusive 
approach, 16% using the etiologic approach, and 15% using the 
substitutive approach. Measures of depression severity and the 
need for treatment did not differ significantly across the 3 diag¬ 
nostic approaches. For minor depression, both the prevalence 
and measures of severity varied more significantly. In a related 
study, interobserver agreement among mental health profession¬ 
als was slightly higher for the inclusive approach (k = 1.0) than 
for the etiologic approach (k = 0.88). 87 Two other studies have 
shown high levels of agreement between the etiologic and substi¬ 
tutive approaches. 88,89 Although the data are limited, these studies 


Simple 

Independent Agreement 

Design Assessment 3 Blinding 6 (k) 
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show high concordance between the different approaches and 
high interobserver agreement in physically ill patients. Because 
the substitutive approach requires learning new criteria and does 
not offer a clear advantage, we recommend the inclusive or etio- 
logic approaches. 

How Can I Improve My Skills 
for Diagnosing Depression? 

Observational and trial data suggest that specific communica¬ 
tion and interviewing skills are related to diagnostic perfor¬ 
mance. Three studies using “standardized patients,” or actors 
presenting with a scripted set of complaints, suggest that phy¬ 
sicians are more likely to recognize or diagnose depression 
when they ask questions about feelings or psychosocial 
issues. 90 ' 92 In one of these studies, recognition rates approached 
100% if physicians asked about mood and anhedonia. 93 

We did not identify any trials designed to improve the accu¬ 
racy or reliability of diagnostic interviews for depression. Exist¬ 
ing trials focus primarily on improving recognition rates, or 
sensitivity, which is only one aspect of diagnostic accuracy. Four 
randomized controlled trials of continuing medical education 
programs for physicians (n = 329) show generally positive 
results. 93 ' 96 Three of the trials focused specifically on or included 
recognizing depression 93 ' 95 and the fourth trial focused more 
generally on communication skills training designed to address 
patients’ emotional distress. 96 Trained vs untrained physicians 
were significantly more likely to recognize depression or psycho¬ 
social problems in the 2 trials that provided 8-hour training ses¬ 
sions and emphasized communication or interviewing skills 94,95 
or in the trial that provided access to an on-site consulting psy¬ 
chiatrist after a shorter training session. 93 These data suggest that 
motivated physicians can improve their communication skills 
and sensitivity to emotional distress or depressive disorder. 
Medical schools and residency programs should consider incor¬ 
porating similar training in their curricula. 


CLINICAL SCENARIO—RESOLUTION 


You follow up on Mr P’s “frazzled” comment and learn that 
he has been under intense work-related stress. Knowing that 
recent stress increases the likelihood of clinical depression, 
you ask, “Have you been feeling sad or depressed much of the 
time?” Mr P has been feeling down nearly every day for sev¬ 
eral weeks and on further questioning meets criteria for 
major depression. A focused review of systems and physical 
examination does not suggest a secondary cause of depres¬ 
sion. He does not drink alcohol and has no history of depres¬ 
sion. You discuss both antidepressant medication and 
psychological treatment options for depression. 
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CLINICAL SCENARIO 


You decide to implement the US Preventive Services Task 
Force (USPSTF) recommendation to screen for depres¬ 
sion. Which questionnaire will you use? Will you ask the 
questions yourself or ask your staff to administer the ques¬ 
tionnaire as part of the check-in process? Will you screen 
all adults or screen more selectively? What other care com¬ 
ponents should be in place to effectively follow through on 
the screening results? 

UPDATED SUMMARY ON SCREENING FOR DEPRESSION 

Original Review 

Williams JW Jr, Noel PH, Cordes JA, Ramirez G, Pignone M. Is 
this patient clinically depressed? JAMA. 2002;287(9):1160-1170. 

UPDATED LITERATURE SEARCH 

The high prevalence of depressive disorders, suboptimal recogni¬ 
tion rates, and availability of efficacious treatments has long pro¬ 
vided the impetus to evaluate screening approaches. The USPSTF 
updated their recommendations (2002) according to new evi¬ 
dence concerning the validity of screening instruments, the effec¬ 
tiveness of screening, and treatment approaches that ensure 
adequate follow-up. 1 We conducted an updated MEDLINE 
search for English-language medical literature published between 
2000 and August 2004 evaluating the performance of depression 
case-finding instruments in primary care settings. Search terms 
were “depressive disorder” or “depression,” textword terms for 
each instrument, and a search filter for articles on diagnosis. The 
search yielded 307 articles; an additional 5 articles were identified 
from citations. We retained studies from primary care settings 
that administered a case-finding instrument and used a standard 
interview such as the Structured Clinical Interview for DSM-IV- 
TR (SCID) to make a criterion-based diagnosis (eg, Diagnostic 
and Statistical Manual of Mental Disorders [Third Edition 
Revised] [DSM-III-R], and Diagnostic and Statistical Manual of 
Mental Disorders [Fourth Edition] [DSM-/V]). 2 ' 3 We limited 
these studies further with requirements that case-finding instru¬ 
ments have (1) easy to average literacy, (2) scores estimable with¬ 
out a calculator, (3) a depression-specific component, and 
(4) evaluation in at least 1 study of 100 or more subjects. After 


review of titles and abstracts, 25 articles underwent full-text 
review; 5 met all eligibility criteria. We also retrieved a recent sys¬ 
tematic review published by the USPSTF. 

NEW FINDINGS 

• In adults, 2- to 9-item screening instruments perform 
comparably to longer depression questionnaires. 

• Ultrashort questionnaires, such as the 2-item Primary Care 
Evaluation of Mental Disorders (PRIME-MD), can be 
administered easily in writing or verbally (see Appendix). 

• The brief 9-item Patient Health Questionnaire (PHQ-9) 
gives the best discriminatory power and more diagnostic 
information for depression diagnoses. The PHQ-9 can also 
quantify treatment responses (see Appendix). 

• Instruments tailored to subgroups (eg, older adults) lack 
proof of superiority to instruments developed for general 
primary care populations. 

• Brief, well-validated instruments have not been developed 
for children treated in primary care settings. 

Details of the Update 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The data presented in the original publication have not 
changed; instead, recent literature provides more informa¬ 
tion and revised estimates of the sensitivity, specificity, and 
likelihood ratios (LRs). 

CHANGES IN THE REFERENCE STANDARD 

The reference standard for depressive disorders remains the 
criterion in the DSM-IV text revision. 3 

RESULTS OF THE LITERATURE REVIEW 

Three brief screening instruments were identified; a total of 14 
instruments have now been evaluated in primary care. The Hos¬ 
pital Anxiety and Depression Scale (HADS), 4 the World Health 
Organization-Five Well-Being Scale (WHO-5), 5 and the Yale 
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1-question screen 6 range from 1 to 7 questions and take fewer 
than 2 minutes to administer. The HADS, designed for medi¬ 
cally ill inpatients and outpatients, has 7-item depression and 
anxiety subscales in which a score of 8 or higher is considered a 
positive result (range, 0-21). The WHO-5 is a 5-item quality of 
life measure in which scores of 12 or lower are considered a pos¬ 
itive result (range, 0-25). Although the WHO-5 is described as 
a quality-of-life measure, it asks about specific depression 
symptoms. The Yale 1-question screen asks, “Do you often feel 
sad or depressed?” In addition to these 3 new instruments, new 
data were published on the PHQ-9 and the 2-item version of the 
PHQ and the PRIME-MD, 7,8 the Geriatric Depression Scale 
(GDS), 9 and the Center for Epidemiological Studies Depression 
Scale (CES-D). 9 Five studies evaluated these instruments in 5652 
adult patients, of whom 1653 underwent criterion interviews. 
Studies were conducted in the United States (n = 2), Germany 
(n = 2), and New Zealand. 

These studies add to previous evidence that brief, 2- to 9-item 
screening instruments perform comparably to or better than 
longer questionnaires (see ible 19-6). In a high-quality study, 
Henkel et al 10 compared the PHQ-9 and WHO-5 with the 12- 
item General Health Questionnaire (GHQ-12), a general meas¬ 
ure of psychological well-being. The PHQ-9 had a significantly 
higher positive likelihood ratio (LR+) (5.2; 95% confidence 
interval [Cl], 3.9-6.8) than the WHO-5 (LR+, 2.6; 95% Cl, 
2.2-3.0) or GHQ-12 (LR+, 2.2; 95% Cl, 1.9-2.6). The PHQ-9 
negative likelihood ratio (LR-) (0.26; 95% Cl, 0.17-0.4) was 
comparable to the WHO-5 (LR-, 0.11; 95% Cl, 0.05-0.25) and 
GHQ-12 (LR-, 0.24; 95% Cl, 0.14-0.42). Three other studies 
support these findings. Lowe et al 11 compared the PHQ-9 to the 
HADS and WHO-5 in 1619 patients drawn from academic- 
affiliated family medicine practices. The PHQ-9 had a signifi- 


Table 19-6 Likelihood Ratios of Brief Screening Instruments to 
Identify Depression or Dysthymia 


Major Depressive Disorder 


Screening Test 
(No. of Studies) 

Summary LR+ 
(95% Cl) or Range 

Summary LR- 
(95% Cl) or Range 

Yale 1 item (1) 

1.8 (1.1-2.8) 

0.56(0.27-1.14) 

PRIME-MD (4) 

2.6 (2.1-3.2) 

0.15(0.08-0.28) 

WH0-5 (1) 

2.2 (2.0-2.4) 

0.01 (0-0.2) 

HADS (2) 

2.9 (2.5-3.5) 

0.25(0.15-0.41) 

PHQ-9 (2 Y 

4.9-7.3 

0.02-0.14 

GDS (3) b 

2.4-4.0 

0.12-0.32 

Major Depressive Disorder or Dysthymia 

WH0-5 (2) 

2.4 (2.2-2.7) 

0.11 (0.06-0.19) 

HADS (1) 

3.2 (2.7-3.9) 

0.25(0.18-0.37) 

PHQ-9 (3) 

5.9 (4.2-8.3) 

0.29 (0.23-0.38) 


Abbreviations: Cl, confidence interval; GDS, Geriatric Depression Scale; HADS, Hospital 
Anxiety and Depression Scale; LR+, positive likelihood ratio; LR-, negative likelihood 
ratio; PHQ-9,9-item Patient Health Questionnaire; PRIME-MD, Primary Care Evaluation 
of Mental Disorders; WH0-5, World Health Organization-Five Well-Being Scale. 

“Data are the range of values across studies; summary statistics were not calculated 
because 2 thresholds (10 and 11) were used. 

“Data are the range of values across the studies; summary statistics were not calcu¬ 
lated because of varying thresholds and instrument versions. 


cantly higher LR+ and a comparable LR- to other instruments. 
Kroenke et al 7 - 8 combined data on the PHQ-9 from primary care 
medical and obstetrics and gynecology populations. 12,13 This 
high-quality study found an LR+ of 7.3 (95% Cl, 5.6-9.4) and 
LR- of 0.14 (95% Cl, 0.06-0.32). Collectively, these studies show 
that the PHQ-9 performs better than other brief instruments in 
head-to-head comparisons and has LRs that are comparable or 
superior to other longer depression questionnaires. The PHQ-9 
has the advantage of asking specifically about DSM-IV criterion 
symptoms for major depression and has been shown responsive 
to change in clinical status. Therefore, it provides essential 
symptom data for a diagnostic interview and can also monitor 
treatment responses. 

Depression instruments are typically given to patients in 
paper form for self-administration. In 15 New Zealand general 
practices, physicians verbally administered the 2-item PRIME- 
MD to 421 consecutive patients. 14 Verbal administration per¬ 
formed well, with an LR+ of 2.9 (95% Cl, 2.5-3.5) and LR- of 
0.05 (LR-, 0.02-0.11). These performance characteristics using 
verbal administration are almost identical to previous studies 
of the PRIME-MD that used self-administration. For clini¬ 
cians wishing to adopt a streamlined approach to depression 
screening, this study supports verbal administration of the 
simple 2-item screen. 

Another salient issue is how well depression screening instru¬ 
ments perform in important subgroups such as older adults, 
the medically ill, and children. One study evaluated the Yale 1- 
question, the PRIME-MD, the GDS, and CES-D in 360 adults 
aged 60 years or older; 125 of 360 patients were recruited from 
primary care settings. 8 The PRIME-MD performed less well 
than studies conducted in mixed age populations, whereas the 
GDS and CES-D had statistically similar diagnostic odds ratios. 
However, these results were biased because major depression 
diagnoses were made blind to PRIME-MD screening results but 
with knowledge of the GDS and CES-D screening results. The 
Yale 1-question performed poorly. For older adults, these data 
raise the possibility that brief, general depression instruments 
perform worse than longer instruments or those designed spe¬ 
cifically for older adults. This hypothesis remains unproven and 
needs testing in a larger, higher-quality study with subgroup 
analyses of older and younger patients. As discussed above, the 
PHQ-9 performed better than the HADS in primary care 
patients, although comparisons in patients with severe or multi¬ 
ple chronic medical illness are not available. Finally, we could 
not identify well-validated instruments for children and adoles¬ 
cents in primary care settings. 

EVIDENCE FROM GUIDELINES 

The USPSTF recommends “screening adults for depression in 
clinical practices that have systems in place to assure accurate 
diagnosis, effective treatment and follow-up.” 15 This recommen¬ 
dation came from an evidence synthesis showing that although 
trials of screening alone had small benefits, even larger benefits 
accrued in trials that used feedback of depression screening 
results coordinated with effective treatment and follow-up. No 
specific screening instrument was recommended, but it was 
observed that the 2-item PRIME-MD might be as effective as 
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longer instruments. The task force recommended that clinicians 
choose the method that best fits their personal preference, the 
patient population served, and the practice setting. The optimal 
screening interval is unknown, and all positive screening test 
results should trigger full diagnostic interviews. 

The USPSTF concluded that evidence is insufficient to recom¬ 
mend for or against routine screening of children or adolescents 
for depression. They found limited evidence on the accuracy and 
reliability of screening tests in children and adolescents. 

The Canadian Task Force on Preventive Health Care updated 
its guidelines in 2005, 16 relying on the same 2002 evidence syn¬ 
thesis used by the USPSTF. Their recommendations were con¬ 
cordant with the USPSTF. Highlighted issues were as follows: 

1. “Recurrent screening may be most productive in patients 
with a history of depression, unexplained somatic symp¬ 
toms, comorbid psychological conditions, substance abuse, 
or chronic pain.” 

2. Elements of systems that ensured good depression care were 
education of the patients, health care providers, or both, a 
mechanism to ensure that screening results are reported to 
the patient’s clinician, who can confirm the diagnosis and 
provide appropriate treatment, access to case management 
or mental health care, and telephone follow-up. 

Web Resources for Depression Screening 

The US Preventive Services Task Force Recommendation on 
depression screening is available online (http://www.ahrq.gov/ 
clinic/uspstf/uspsdepr.htm. Accessed May 28,2008). 1 


CLINICAL SCENARIO—RESOLUTION 


During the past year, your practice has focused on improving 
preventive care, and your staff is receptive to implementing 
depression screening. Keeping in mind the tensions between 
practice efficiency, costs, and the benefits of improved patient 
outcomes, you decide on the following strategy. You elect to 
screen all adults annually for depression. By screening all 
adults, you establish a consistent approach for your staff that 
simplifies logistics. With input from your nurses, you decide 
that nurses will verbally ask the 2 PRIME-MD questions (a 
brief survey that makes you feel confident that you will not 
miss many depressed patients) and that patients with positive 
results will be given the PHQ-9 to complete (a slightly longer 
survey that helps you assess whether the patients who have 
positive results on the 2-item survey really are depressed). 

You believe your practice has an average prevalence of 
major depression (=7%); hence, patients who have negative 
results on the 2-item PRIME-MD (LR-, 0.15) will have a 
posttest probability of 1% for major depression. Patients who 
have positive results on the PRIME-MD and then score 10 or 
higher on the PHQ-9 (LR+, 4.9-7.3) will have a posttest prob¬ 
ability of 49% to 59% for major depression. Patients who have 
positive results on the 2-item PRIME-MD but have a more 
normal score or lower than 10 on the PHQ-9 (LR-, 0.02-0.14) 
will have a posttest probability of 1% to 4% for major depres¬ 
sion. All patients who score 10 or higher on the PHQ-9 will 
require further evaluation to establish a specific diagnosis. 

See next page for the “Make the Diagnosis” section. 
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SCREENING FOR DEPRESSION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Although most depression screening focuses on major 
depressive disorders, a number of treatable conditions may 
be detected by depression screening. In primary care, the 
combined prevalence of depression and dysthymia is 
approximately 7% to 12% (Table 19-7). 


Table 19-7 Prevalence of Depression and Dysthymia in Primary Care 

Illness 

Prevalence, % 

Major depressive disorder 

4.8-8.6 

Dysthymic disorder 

2.1-3.7 

Depression NOS 

4.4-5.4 


Abbreviation: NOS, not otherwise specified. 


POPULATIONS FOR WHOM 
DEPRESSION SHOULD BE ASSESSED 

All adults. 

DETECTING THE LIKELIHOOD OF 
MAJOR DEPRESSIVE DISORDER 

See Table 19-8. 


Table 19-8 Likelihood Ratios for Detecting Major Depressive Disorder 



Summary LR+ 



(95% Cl) or 

Summary LR- 

Instrument and Threshold 

Range 

(95% Cl) or Range 

PHQ-9 (score > 10 is positive) 3 

4.9-7.3 

0.02-0.14 

PRIME-MD (score > 1 is positive) 

2.6 (2.1-3.2) 

0.15(0.08-0.28) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; PHQ-9,9-item Patient Health Questionnaire; PRIME-MD, Primary 
Care Evaluation of Mental Disorders. 

“Data are the range of values across studies; summary statistics were not calculated 
because 2 thresholds (10 and 11) were used. 


DETECTING THE LIKELIHOOD OF MAJOR 
DEPRESSIVE DISORDER OR DYSTHYMIA 

See Table 19-9. 


Table 19-9 Likelihood Ratios for Detecting Major Depressive Disorder 
or Dysthymia 

Summary LR+ Summary LR- 
Instrument (95% Cl) (95% Cl) 


PHQ-9 (score > 10 is positive) 5.9 (4.2-8.3) 0.29 (0.23-0.38) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; PHQ-9,9-item Patient Health Questionnaire. 

REFERENCE STANDARD TESTS 

A structured (eg, Diagnostic Interview Schedule) or semi- 
structured diagnostic interview (eg, Structured Clinical 
Interview for DSM-IV) to establish diagnoses. 
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APPENDIX—DEPRESSION SCREENING INSTRUMENTS 

Patient Health Questionnaire (PHQ-9) 

See Table 19-10. 


Table 19-10 Patient Health Questionnaire (PHQ-9)“ 

During the last 2 weeks, how often have you been bothered by any of the following problems? 

Not at 
all 

Several 

days 

More than half 
the days 

Nearly every 
day 

1. Little interest or pleasure in doing things 

0 

1 

2 

3 

2. Feeling down, depressed, or hopeless 

0 

1 

2 

3 

3. Trouble falling or staying asleep, or sleeping too much 

0 

1 

2 

3 

4. Feeling tired or having little energy 

0 

1 

2 

3 

5. Poor appetite or overeating 

0 

1 

2 

3 

6. Feeling bad about yourself—or that you are a failure or have let yourself or your family down 

0 

1 

2 

3 

7. Trouble concentrating on things, such as reading the newspaper or watching television 

0 

1 

2 

3 

8. Moving or speaking so slowly that other people could have noticed, or the opposite—being so 

0 

1 

2 

3 

fidgety or restless that you have been moving around a lot more than usual 

9. Thoughts that you would be better off dead or of hurting yourself in some way 

0 

1 

2 

3 

Scoring: Add up the results for each item. A score > 10 is positive for depression or dysthymia. 

If you checked off any problems, how difficult have these problems made it for you to do your work, take care of things at home, or get along with other people? 

Not difficult at all Somewhat difficult 


Very difficult 

Extremely difficult 

□ □ 


□ 

□ 



“The PHQ was developed from the PRIME-MD. PRIME-MD is a trademark of Pfizer Inc. Copyright © 1999, Pfizer Inc. All rights reserved. Reproduced with permission. 


PRIME-MD (2 Items) 

See Table 19-11. 


Table 19-11 PRIME-MD“ (2 Items) 



During the past month have you been bothered a lot by: 

Yes 

No 

1. Little interest or pleasure in doing things 

0 

1 

2. Feeling down, depressed, or hopeless 

0 

1 


Scoring: A “yes" answer on either question is considered a positive result for depression. 
“PRIME-MD is a trademark of Pfizer Inc. Copyright © 1999, Pfizer Inc. All rights 
reserved. Reproduced with permission. 
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Depression 



TITLE Screening for Depression in Primary Care With 2 
Verbally Asked Questions: Cross-Sectional Study. 

AUTHORS Arroll B, Khin N, Kerse N. 

CITATION BMJ. 2003;327(7424): 1144-1146. 

QUESTION How well does the 2-question Primary Care 
Evaluation of Mental Disorders (PRIME-MD) 1 perform 
for detecting depression? 

DESIGN General practitioners asked the 2 screening 
questions, and major depression diagnoses were made 
using a structured interview, blind to the screening results. 

SETTING Fifteen general practices in Auckland, New 
Zealand. 

PATIENTS Four hundred twenty-one consecutive patients 
(median age, 46 years) who agreed to participate; 194 who 
declined, 47 taking psychotropic drugs, and 8 who were not 
asked the screening questions were excluded. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The PRIME-MD prompts: 

1. “During the past month have you often been bothered by 
feeling down, depressed, or hopeless?” and 

2. “During the past month have you often been bothered by 
little interest or pleasure in doing things?” 

A yes response to either question was considered a positive 
result. An interviewer used the computer-assisted structured 
Composite International Diagnostic Interview to establish 
major depression diagnoses. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios. 

MAIN RESULTS 

One hundred fifty-seven (37%) of 421 patients had a positive 
result; 29 patients (6.8%) were diagnosed with major depres¬ 
sion (Table 19-12). 


Table 19-12 Likelihood Ratio tor the 2-Question PRIME-MD 

LR+ LR- DOR 

Test Sensitivity Specificity (95% Cl) (95% Cl) (95% Cl) 

PRIME-MD 0.97 0.67 2.9 0.05 62(24-156) 

2 items (2.5-3.5) (0.02-0.11) 

Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio; PRIME-MD, Primary Care Evaluation of 
Mental Disorders. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Consecutive patients were studied. Because 
there were few exclusion criteria, there was a high participa¬ 
tion rate and the patients had a broad age range. 

LIMITATIONS Dysthymia, a chronic depressive disorder that 
is responsive to antidepressants and psychotherapy, is not 
considered. 

This study continues a trend of evaluating brief depression 
screening instruments. The study methodology was strong; 
excluding patients taking psychotropic drugs likely skewed the 
spectrum toward milder major depressive illnesses. The unique 
contribution of this study is that practitioners simply asked the 2 
screening questions, which is logistically simple. The high sensi¬ 
tivity and relatively low specificity are consistent with studies 
using paper versions of the questionnaire. 

Reviewed by John W. Williams Jr, MD 

REFERENCE FOR THE EVIDENCE 

1. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for 
diagnosing mental disorders in primary care: the PRIME-MD 1000 
study. JAMA . 1994;272(22):1749-1756. 
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TITLE Case-Finding for Depression in Elderly People: 
Balancing Ease of Administration With Validity in Varied 
Treatment Settings. 

AUTHORS Blank K, Gruman C, Robison JT. 

CITATION / Gerontol. 2004:59A(4):378-384. 

QUESTION Compared with longer depression screen¬ 
ing instruments, how well do the 2-question Primary Care 
Evaluation of Mental Disorders (PRIME-MD) and the 
Yale 1-question screens perform for detecting depression? 

DESIGN Patients completed written screening instru¬ 
ments. Major depression diagnoses were made with a struc¬ 
tured interview, blind to the PRIME-MD and Yale 
screening results but not to results from the longer screen¬ 
ing instruments. 

SETTING Two urban primary care practices, a general 
medical hospital, and 8 nursing homes in the United 
States; results reported here are limited to the primary 
care practices. 

PATIENTS Three hundred sixty consecutive patients 
(125 from primary care) agreed to participate; 35 were 
ineligible because of psychosis, a positive screening result 
for alcoholism, or a positive screening result for dementia. 
An additional 12 patients declined to participate. Most 
were white; mean age was 77 ± 8.9 years. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The PRIME-MD prompts, “During the past month have you 
often been bothered by feeling down, depressed, or hope¬ 
less?” and “During the past month have you often been both¬ 
ered by little interest or pleasure in doing things?” A yes 
response to either question was considered a positive result. 
The Yale question is “Do you often feel sad or depressed?” 
Comparison instruments were the 20-item Center for Epide¬ 
miologic Studies Depression Scale 1 (CES-D; available at 
http://www.chcr.brown.edu/pcoc/cesdscale.pdf; accessed 
May 28, 2008) and the 30-item Geriatric Depression Scale 2 
(GDS Long Version; available at http://www.stanford.edu/ 
-yesavage/GDS.html; accessed May 28, 2008). An inter¬ 
viewer used the mood sections of the diagnostic interview 
schedule to establish major depression diagnoses. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, likelihood ratios, and diagnostic odds 
ratios (calculated from data provided in the article). 

MAIN RESULTS 

Fourteen (11%) of 125 patients were diagnosed with major 
depression ( ). 


Table 19-13 Likelihood Ratios for the Longer Screening Instruments 
Compared With the Shorter Instruments 3 


Test 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

Longer Screening Instruments 

CES-D 

0.79 

0.75 

3.1 

(2.0-4.8) 

0.29 

(0.10-0.79) 

11 

(2.8-42) 

GDS 

0.79 

0.67 

2.4 

0.32 

7.3 




(1.6-3.4) 

(0.12-0.88) 

(1.9-28) 


Shorter Screening Instruments 


PRIME-MD 

0.79 

0.58 

1.9 

0.37 

5.0 

2 items 



(1.3-2.6) 

(0.13-1.0) 

(1.3-19) 

Yale 

0.64 

0.64 

1.8 

0.56 

3.2 




(1.1-2.8) 

(0.27-1.1) 

(1.0-10) 


Abbreviations: CES-D, Center for Epidemiologic Studies Depression Scale; Cl, confi¬ 
dence interval; DOR, diagnostic odds ratio; GDS, Geriatric Depression Scale; LR+, 
positive likelihood ratio; LR-, negative likelihood ratio; PRIME-MD, Primary Care 
Evaluation of Mental Disorders. 

Thresholds for a positive screening result were CESD > 16 and GDS > 10. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2 for the PRIME-MD and Yale 
questionnaires; level 4 for the CES-D and GDS questionnaires. 

STRENGTHS Few exclusion criteria; consecutive patients; 
high participation rate. 

LIMITATIONS The sample size was small, as evidenced by 
the broad confidence intervals around the likelihood ratio. 
The criterion interviewers were unblinded to CES-D and 
GDS results, potentially biasing results toward improved per¬ 
formance compared with the brief screening instruments. 

Results of this study should be interpreted with caution 
because of the small sample size and lack of blinding for the 
CES-D and GDS questionnaires. These 2 questionnaires, stud¬ 
ied with a lower level of quality because of the lack of blinding, 
showed the best accuracy, as evidenced by the highest diagnostic 
odds ratio. However, there is no statistical difference in the diag¬ 
nostic odds ratios between these questionnaires (P = .57). 

The study gives useful information on test performance in 
older adults. Compared with past studies, the CES-D and GDS 
performed similarly. The PRIME-MD performed less well than 
in studies conducted in mixed-age populations. Brief 1- and 2- 
item questionnaires may perform less well in older (perhaps 
more medically ill) patients. This hypothesis needs testing in a 
larger study that allows for subgroup analyses of older and 
younger patients. 

Reviewed by John W. Williams Jr, MD 

REFERENCES FOR THE EVIDENCE 

1. Radloff LS. The Center for Epidemiological Studies Depression Scale 
(CES-D): a self-report depression scale for research in a general popula¬ 
tion. Appl Psychol Meas. 1977;1:385-401. 

2. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a 
geriatric depression screening scale: a preliminary report. / Psychiatr Res. 
1982;17(l):37-49. 
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TITLE Identifying Depression in Primary Care: A Com¬ 
parison of Different Methods in a Prospective Cohort 
Study. 

AUTHORS Henkel V, Mergl R, Kohnen R, Maier W, 
Moller H, Hegerl U. 

CITATION BMJ. 2003;326(7382):200-201. 

QUESTION How well do 3 brief screening instruments 
perform for detecting depression in primary care? 

DESIGN On a single day all patients were asked to com¬ 
plete the 3 screening instruments. A telephone inter¬ 
viewer, blind to screening results, used the structured 
Composite International Diagnostic Interview to establish 
major depression diagnoses. 

SETTING Eighteen primary care facilities in Germany. 

PATIENTS A total of 487 patients gave informed con¬ 
sent. Of these, 431 completed all study assessments; 56 
had incomplete data and were excluded. Demographic 
descriptors were not given. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The Patient Health Questionnaire (PHQ) is a 9-item, 
depression-specific, self-administered instrument that asks 
about the criterion symptoms of major depression. 1 It is 
scored 0 to 27; a score of 10 or higher is considered a posi¬ 
tive result. The General Health Questionnaire (GHQ) is a 
general measure of psychological well-being that exists in 
multiple versions; a 12-item version (GHQ-12) was used in 
this study and a score of 3 or higher was considered a posi¬ 
tive result. 2 The World Health Organization-Five Well- 
Being Index (WHO-5) is a 5-item depression measure 
scored from 0 to 25; scores of 12 or lower are considered a 
positive result. 3 The Composite International Diagnostic 
Interview was used to establish the diagnosis by the Diag¬ 
nostic and Statistical Manual of Mental Disorders, (Fourth 
Edition) or International Classification of Diseases and 
Related Health Problems, Tenth Revision criteria. 4 - 5 

MAIN OUTCOME MEASURES 

Sensitivity and specificity. Likelihood ratios (LRs) were cal¬ 
culated by the reviewer. 

MAIN RESULTS 

Seventy-one (17%) of 431 patients were diagnosed with a 
depressive disorder, including 43 with major depression, 22 
with dysthymia, 3 with bipolar depression, and 3 with a 
mood disorder caused by a general medical condition 

(Table 19-14). 


Table 19-14 Test Performance 


Test 

Sensitivity 

Specificity 

LFS+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

PHQ-9 

0.78 

0.85 

5.2 

(3.9-6.8) 

0.26 

(0.17-0.40) 

20 

(11-37) 

WHO-5 

0.93 

0.64 

2.6 

(2.2-3.0) 

0.11 

(0.05-0.25) 

24 

(9.5-61) 

GHQ-12 

0.85 

0.62 

2.2 

(1.9-2.6) 

0.24 

(0.14-0.42) 

9.2 

(4.7-18) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; GHQ-12, General 
Health Questionnaire-12; LR+, positive likelihood ratio; LR-, negative likelihood 
ratio; PHQ-9, Patient Health Questionnaire-9; WHO-5, World Health Organization- 
Five Well-Being Index. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Consecutive patients were enrolled. There was 
a large sample size, and 3 instruments, all feasible for pri¬ 
mary care settings, were compared. 

LIMITATIONS The study population is not well described. 
Process of adapting instruments to German was not 
described. Instruments can perform differently in different 
languages. 

The PHQ-9 had the best positive likelihood ratio (LR+) and 
a negative likelihood ratio (LR-), comparable to the GHQ-12, 
a nonspecific measure of psychological distress. The WHO-5 
had the best LR- but only a modest LR+ compared with the 
PHQ-9. Overall, the PHQ-9 and WHO-5 had similar accuracy, 
as evidenced by their diagnostic odds ratio, but we can be 
more certain about the performance of the PHQ-9 because of 
its narrower confidence interval. Future studies might confirm 
that 2 are similar, on average. However, the goals of the clini¬ 
cian should drive whether to select one vs the other. Clinicians 
who want to maximize the chance that a patient who has a 
positive screening result actually has depression should choose 
the PHQ-9 (highest LR+). Clinicians who want to maximize 
the chance of correctly detecting patients without depression 
should choose the WHO-5 (lowest LR-). 

The performance characteristics reported in this study 
may have been affected by translation into German. Another 
caveat is that almost one-third of depressed patients have 
dysthymia, a more chronic and milder depression, that tends 
to decrease the reported sensitivity compared to results of 
studies restricted to patients with major depression. Inclu¬ 
sion of dysthymic patients makes clinical sense because anti¬ 
depressant medications are effective for dysthymia. 

Reviewed by John W. Williams Jr, MD 

REFERENCES FOR THE EVIDENCE 

1. Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report 
version of PRIME-MD: the PHQ primary care study: Primary Care Eval¬ 
uation of Mental Disorders: Patient Health Questionnaire. JAMA. 
1999;282:( 18) 1737-1744. 
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2. Goldberg DG. Manual of the General Health Questionnaire. Windsor, 
Berksire, UK: NFER Publishing; 1978. 

3. Psychiatric Research Unit, Frederiksborg General Hospital, Hiherod, 
Denmark. WHO-5 questionnaires, http://www.who-5.org. Accessed 
May 28,2008. 

4. American Psychiatric Publishing. DSM-IV- TR. Diagnostic and Statistical 
Manual of Mental Disorders (Fourth Edition, Text Revision), http:// 
www.psychiatryonline.com/content.aspx?aID=2016. Accessed May 26, 
2008. 

5. World Health Organization. International Classification of Disease, Ver¬ 
sion 2007. http://www.who.int/classifications/apps/icd/icdlOonline. 
Accessed May 28, 2008. 


TITLE The PHQ-9: Validity of a Brief Depression Sever¬ 
ity Measure. 

AUTHORS Kroenke K, Spitzer RL, Williams JBW. 

CITATION / Gen Intern Med. 2001;16(9):606-613. 

TITLE The Patient Health Questionnaire-2: Validity of a 
2-item Depression Screener. 

AUTHORS Kroenke K, Spitzer RL, Williams JBW 

CITATION Med Care. 2003;41 (11): 1284-1292. 

QUESTION How well do the Patient Health Question¬ 
naire (PHQ) 9- and 2-item versions perform for detecting 
depression in primary care? 1,2 

DESIGN Adults (consecutive or every /7th) making an 
outpatient visit were asked to complete the PHQ. A men¬ 
tal health clinician, blind to screening results, completed a 
structured telephone interview in a subset of patients 
(selected without regard to screening results) to establish 
depressive diagnoses. 

SETTING Seven obstetrics-gynecology clinics, 5 general 
internal medicine clinics, and 3 family practices in the 
United States. 

PATIENTS Three thousand primary care patients (mean 
age, 46 ± 17 years) and 3000 obstetrics-gynecology patients 
(mean age, 31 ±11 years) participated. Patients who declined 
(n = 435) or who did not complete the questionnaire (n = 
1091) were excluded. Of the 6000 participants, a diagnostic 
interview by a mental health clinician was completed in 580. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The PHQ-9 is a 9-item, depression-specific, self-adminis¬ 
tered instrument that asks about the criterion symptoms of 
major depression. It is scored 0 to 27; a score of 10 or higher 
is considered a positive result. The 2-item screen, PHQ-2, 
uses the first 2 items of the PHQ-9 and asks about 
depressed mood and anhedonia. It is scored 0 to 6; a thresh¬ 
old of 3 or higher (set post hoc) was used in this study. The 
Structured Clinical Interview for Diagnostic and Statistical 


Manual of Mental Disorders (Fourth Edition) was used as 
the criterion standard. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios. 

MAIN RESULTS 

One hundred three (18%) of 580 criterion standard patients 
scored 10 or higher on the PHQ-9; 88 (15%) scored 3 or 
higher on the PHQ-2. Forty-one (7.1%) were diagnosed with 
major depression; 65 (11%) were diagnosed with nonmajor 
depressive disorders ( t e 19-. ). 

When diagnoses were broadened to include nonmajor 
depressive disorders, sensitivities decreased to 66% and 62% 
for the PHQ-9 and PHQ-2, respectively. Specificities were 
93% and 95%, respectively. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 for the PHQ-9; level III for 
the PHQ-2. 

STRENGTHS Large sample size. 

LIMITATIONS Threshold set post hoc for the PHQ-2, which 
may overestimate performance characteristics. It is unclear 
how many of those chosen for a criterion standard evaluation 
completed the examination. 

This report combines data from 2 studies conducted in 
primary care and obstetrics and gynecology clinics. The 
PHQ-2 (a revision of the Primary Care Evaluation of Mental 
Disorders [PRIME-MD] that uses 4 response categories 
instead of a yes/no format) performed well, but the threshold 
for a positive screen was set post hoc, which typically leads to 
overestimates of performance. If these results are replicated 
in other populations, the PHQ-2 could be endorsed as a fea¬ 
sible and accurate option for screening. The PHQ-9 per¬ 
formed well in these midlife adults with low rates of 
comorbid medical conditions. In contrast with the PHQ-2, 
the 9-item version can be used to monitor treatment 
response and gives more information for specific depressive 
diagnoses. 3 Some experts recommend using the first 2 PHQ 


Table 19-15 Likelihood Ratios for Patient Heath Questionnaire (PHQ)-9 
and PHQ-2 for Depression 


Test 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

PHQ-2 

0.83 

0.90 

8.3 

(6.2-11) 

0.19 

(0.10-0.37) 

44 

(18-103) 

PHQ-9 

0.88 

0.88 

7.3 

(5.6-9.4) 

0.14 

(0.06-0.32) 

52 

(20-139) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 
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items (using a yes/no response format) and then administer¬ 
ing the complete PHQ-9 to those who endorse a yes response 
to either of the first 2 items. 

Reviewed by John W. Williams Jr, MD 

REFERENCES FOR THE EVIDENCE 

1. Spitzer RL, Williams JBW, Kroenke K, Hornyak R, McMurray J, for the 
Patient Health Questionnaire Obstetrics-Gynecology Study Group. 
Validity and utility of the PRIME-MD Patient Health Questionnaire in 
assessment of 3000 obstetric-gynecologic patients: the PRIME-MD 
Patient Health Questionnaire Obstetrics-Gynecology Study. Am J Obstet 
Gynecol. 2000;183(3):759-769. 

2. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self- 
report version of PRIME-MD: the PHQ primary care study. JAMA. 
2000;282(18):1737-1744. 

3. Lowe B, Kroenke K, Herzog W, Grafe K. Measuring depression outcome 
with a brief self-report instrument: sensitivity to change of the Patient 
Health Questionnaire (PHQ-9 ). J Affect Disord. 2004;81(l):61-66. 


TITLE Comparative Validity of 3 Screening Question¬ 
naires for DSM-IV Depressive Disorders and Physicians’ 
Diagnoses. 

AUTHORS Lowe B, Spitzer RL, Grafe K, et al. 

CITATION J Affect Disord. 2004;78(2):131-140. 

QUESTION How well do 3 brief screening instruments 
perform for detecting depression in primary care? 

DESIGN On selected days, all adult patients were asked 
to complete the 3 screening instruments. An interviewer, 
blind to screening results, completed diagnostic inter¬ 
views in a subset of patients (selected without regard to 
screening results) to establish depressive diagnoses. 

SETTING Twelve family practices and the outpatient 
clinics of an academic medical center in Germany. 

PATIENTS A total of 1619 consecutive patients (mean 
age, 42 ± 14 years) participated; 431 declined or were 
unable to complete the questionnaires. The diagnostic 
interview was completed in 528 (88%) of the 600 selected 
for diagnostic confirmation; 27 were excluded because of 
missing data, leaving a final sample of 501. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The Patient Health Questionnaire-9 (PHQ-9) is a 9-item, 
depression-specific, self-administered instrument that asks 
about the criterion symptoms of major depression. It is scored 
0 to 27; a score of 10 or higher is the usual threshold but a 
score of 11 or higher was used in this study. The Hospital Anx¬ 
iety and Depression Scale (HADS) has 7-item depression and 
anxiety subscales. 1 It is scored 0 to 21; a score of 8 or higher is 
considered a positive result. The World Health Organization-5 


Well-Being Scale (WHO-5) is a 5-item depression measure 
scored from 0 to 25; scores of 12 or lower are considered a pos¬ 
itive result. 2 The Structured Clinical Interview for DSM-IV 
was used as the criterion standard. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity. The reviewer calculated likelihood 
ratios. 

MAIN RESULTS 

Sixty-six (13%) of 501 patients were diagnosed with major 
depression; 126 (25%) were diagnosed as having any depres¬ 
sive disorder that included adjustment disorder and dys- 
thymia ( able 19-16). 

When diagnoses were broadened to include any depressive 
disorder, sensitivities decreased to 81%, 81%, and 94% for 
the HADS, PHQ (threshold > 10), and WHO-5, respectively. 
Specificities were 75%, 82%, and 60%, respectively. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large sample size; community and academic 
settings; criterion standard completed for a high proportion 
of individuals chosen for evaluation. 

LIMITATIONS Results for the PHQ-9 and WHO-5 were 
reported for atypical thresholds and may have been deter¬ 
mined post hoc. The author supplied supplemental data for 
the usual WHO-5 threshold, but the higher PHQ-9 threshold 
creates an expectation of lower sensitivity. Despite an 
expected lower sensitivity, the sensitivity was high. 

A unique contribution is the comparison of a brief depres¬ 
sion-specific instrument designed for primary care popula¬ 
tions (the PHQ-9 and WHO-5) to a brief depression-specific 
instrument designed for medically ill patients (the HADS). 
Both the PHQ-9 and WHO-5 were excellent in ruling out 


Table 19-16 Likelihood Ratios of Screening Instruments for 
Major Depression 


Test 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

PHQ-9 

0.98 

0.80 

4.9 

(4.1-6.0) 

0.02 

(0-0.14) 

193 

(27-1408) 

HADS 

0.88 

0.69 

2.8 

(2.4-3.3) 

0.18 

(0.09-0.34) 

16 

(7.5-35) 

WH0-5 a 

1.00 

0.54 

2.2 

(2.0-2.4) 

0 (0-0.22) 

156 

(9.6-2540) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; HADS, Hospital 
Anxiety and Depression Scale; LR+, positive likelihood ratio; LR-, negative likelihood 
ratio; PHQ-9, Patient Health Questionnaire-9; WHO-5, World Health Organization-5 
Well-Being Scale, 

“Data for the WHO-5, with a threshold of 12 or lower, were supplied by the author. 
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depression, but the WHO-5 had a modest positive likelihood 
ratio. 

One caution when interpreting these results for English- 
speaking populations is that the instruments were given in 
German. Even when careful cultural adaptations are made, 
questionnaires may perform differently across language 
groups and cultures. 

Reviewed by John W. Williams Jr, MD 


REFERENCES FOR THE EVIDENCE 

1. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. 
Acta Psychiatr Scand. 1983;67(6):361-370. 

2. Psychiatric Research Unit, Frederiksborg General Hospital, Hillerod, 
Denmark. WHO-5 questionnaires, http://www.who-5.org. Accessed 
May 28,2008. 
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CLINICAL SCENARIO 


Does This Patient Have 

a Family History 
of Cancer? 

Harvey J. Mu iff, MD, MPH 
David R. Spigel, MD 
Sapna Syngal, MD, MPH 


A 35-year-old woman presents for an initial visit and 
during the medical interview mentions that her mother 
and grandmother had breast cancer. She reports that 
her mother was diagnosed with cancer at age 42 years, 
and she believes that her grandmother, on her mother’s 
side, was diagnosed in her 30s. Because of her family 
history, she is concerned about her risk of developing 
breast cancer. Despite having no symptoms of breast 
cancer, she wonders at what age she should start having 
mammograms and whether she should have genetic 
testing. 


WHY IS IT IMPORTANT TO RECORD AN 
ACCURATE FAMILY HISTORY OF CANCER? 


Individuals with a family history for certain kinds of can¬ 
cers can have an increased risk of developing cancer them¬ 
selves. 1,2 Two meta-analyses found relative risks of 2.1 (95% 
confidence interval [Cl], 2.0-2.2) for breast cancer 3 and 3.1 
(95% Cl, 2.6-3.7) for ovarian cancer 4 in women with 
affected first-degree relatives. Similar higher risks have been 
observed for endometrial cancer, 5 ' 7 colon cancer, 710 and 
prostate cancer. 11 ' 13 

Accurate reporting of family history helps risk-stratify 
patients, which in turn determines screening and preven¬ 
tion interventions. Individuals with family histories that are 
suggestive of a hereditary cancer syndrome (Box 20-1 ) 14 ' 17 
are typically considered at high or very high risk of devel¬ 
oping cancer. Individuals with family histories for cancers 
not recognized as hereditary are generally at a moderately 
increased risk of developing cancer compared with the gen¬ 
eral population. Several organizations have recommended 
initiating screening earlier, more frequently, or both in 
patients at moderately increased risk of developing cancer 
according to their family history. 18 ' 27 Guidelines have also 
been published regarding the management of individuals 
who are at high cancer risk. 26,28 ' 30 A family history of malig¬ 
nancy can not only influence cancer screening initiation 
and frequency but also affect treatment strategies. Family 
histories affect decisions about cancer chemoprevention, 31,32 
and those individuals identified as being at very high risk 
may also be considered for risk-reducing surgeries. 33 Like¬ 
wise, algorithms that predict individuals who might be can¬ 
didates for genetic testing rely almost exclusively on family 
history information. 14,16 
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Box 20-1 Examples of Clinical Diagnostic Criteria for 2 Familial 
Cancer Syndromes and Recommendations Regarding Genetic 
Testing for Cancer Susceptibility 

HEREDITARY NONPOLYPOSIS COLON CANCER (HNPCC) 14 - 15 

All of the following criteria should be present: 

At least 3 relatives must have a cancer associated with 
HNPCC (colon, endometrial, ovarian, stomach, small 
bowel, hepatobiliary, ureter, renal-pelvis, brain). 

One should be a first-degree relative of the other 2. 

At least 2 successive generations should be affected. 

At least 1 of the relatives with cancer associated with 
HNPCC should have received the diagnosis before aged 
50 years. 

HEREDITARY BREAST/OVARIAN CANCER 163 

Any of the following criteria should be present: 

Two breast cancers in a first- or second-degree relative 
and mean age at diagnosis of 40 years. 

One breast cancer and 1 ovarian cancer in a first- or 
second-degree relative and mean age at diagnosis of 41 
to 50 years 3 

Two or more breast cancers and 1 ovarian cancer in a 
first- or second-degree relative. 

Ovarian cancer in 2 relatives. 

AMERICAN SOCIETY OF CLINICAL ONCOLOGISTS POLICY 
STATEMENT ON GENETIC TESTING 17 

Indications for genetic testing: 

The individual has personal or family history features 
suggestive of a genetic cancer susceptibility condition. 

The test can be adequately interpreted. 

The results will aid in diagnosis or influence the medi¬ 
cal or surgical treatment of the patient or family members 
at hereditary risk of cancer. 

“Identified relatives must be on the same side of the family (either maternal or 
paternal relatives). 


Because many screening and prevention strategies for can¬ 
cer rely on self-reported family history information, inaccu¬ 
rate information could result in inappropriate care. A false¬ 
negative family cancer history results in an underestimation 
of cancer risk and missed opportunities for cancer screening. 
Failure to collect adequate family history information and 
appropriately manage a patient’s cancer risk may result in 
substandard care and in some cases has resulted in malprac¬ 
tice litigation. 34 

Conversely, a patient’s false belief in a positive family can¬ 
cer history can cause stress 35 and, when compounded by the 
physician’s overestimation of risk, may lead to unnecessary 
procedures or surgeries. 36 The overestimation of cancer risk 
based on pedigree data creates unneeded referrals for genetic 
testing or cancer risk counseling. 37 - 38 The increased availabil¬ 
ity and demand for genetic services require an even more 


important role for primary care physicians in recording an 
accurate family cancer history 39 - 40 ; however, many physicians 
lack adequate training in genetics to accurately identify and 
refer appropriate candidates for genetic services. 41 - 42 In addi¬ 
tion, with direct-to-consumer advertising for genetic testing 
now a reality, 43 accurate family history collection and cancer 
risk assessment by primary care physicians might help 
decrease the likelihood of inappropriate referrals for genetic 
counseling and testing. 

Few data exist describing how often inaccurate risk assess¬ 
ments are made according to faulty pedigree data. In one ret¬ 
rospective study 35 that examined patients referred to 2 cancer 
genetic clinics, patient treatment was changed in 23 (11%) of 
213 patients after their previously reported family history 
information was found to be inaccurate. In 15 of these 
patients, screening was thought to be unnecessary, although 
in 8 patients cancer risk was determined to be greater than 
initially believed. Further studies have supported these find¬ 
ings, with one study 44 determining that 6 (5%) of 120 
patients referred to a cancer clinic had changes in treatment 
after confirmation of the family cancer history revealed dis¬ 
crepancies. In most of these patients, the cancer risk had 
been overestimated. 

Prevalence of a Positive Family History 
of Specific Familial Cancers 

The prevalence of a family history of cancer varies, depend¬ 
ing on the cancer type. The prevalence of a family history of 
breast cancer has been estimated to range from 5% to 22% 45 ' 48 ; 
colon cancer, 2.0% to 9.4% 8 - 45 - 46 ; ovarian cancer, 1.1% to 
3.5% 45 - 46 - 48 ; endometrial cancer, 0.5% to 1.4% 45 - 46 ; and prostate 
cancer, 4.6% to 9.5%. n - 13 - 46 - 49 Most of this variation is based 
on methodology and study population. Some studies 
included distant relatives in the definition of a positive family 
cancer history, whereas other studies have focused only on 
first-degree relatives. Variability in rates also occurs when the 
results are derived from the general population as opposed to 
patients referred to cancer or genetic centers, which have 
higher prevalence rates. 

How to Elicit a Family Cancer History 

Family medical history information is important for risk 
assessment in numerous chronic medical conditions in addi¬ 
tion to cancer, such as diabetes mellitus and cardiovascular 
disease; therefore, eliciting a family cancer history can serve 
as a model for collecting family history information for other 
disorders. Typically, family history information is collected 
directly from the patient or from screening questionnaires 
filled out by the patient. Alternatively, the patient’s parent or 
another family member may provide the information. 

Screening questionnaires are often either a list of relatives, 
with space to provide information on overall health, age, and 
cause of death, or a list of adult-onset diseases with space to 
list the affected relatives. Disease history should be collected 
on first-degree relatives (mother, father, sisters, brothers, and 
children) and second-degree relatives (maternal and paternal 
grandparents, aunts, uncles, nieces, and nephews). It is 
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important to inquire about various types of cancers because 
certain hereditary cancer syndromes can be identified by spe¬ 
cific cancers that cluster within families (Box 20-1), such as 
endometrial with colon 50 and breast with ovarian. 51 - 52 If the 
initial screening interview or questionnaire reveals a poten¬ 
tial familial predisposition to a particular disease, the family 
history should be expanded. 

Although establishing the numbers of both affected and 
unaffected relatives is important for determining penetrance 
and predicting the likelihood of gene mutations, this informa¬ 
tion for primary care physicians would seldom influence can¬ 
cer screening decisions. For affected relatives, documenting 
the age at cancer diagnosis is important because patients 
developing cancer at ages significantly earlier than typically 
expected increases the possibility of a hereditary cancer syn¬ 
drome. Inaccurate reporting of ages at diagnosis for breast 
cancer can have a considerable influence on risk prediction in 
families with fewer than 4 affected relatives. 53 A 3-generation 
pedigree, displayed graphically in Figure 20-1, offers a conve¬ 
nient symbolic method of summarizing information. Because 
of previous inconsistencies in pedigree symbol usage, the Pedi¬ 
gree Standardization Task Force, organized through the 
National Society of Genetic Counselors, has proposed recom¬ 
mendations for a standardized pedigree nomenclature. 54 

When recording a pedigree, particularly for breast and gyne¬ 
cologic cancers, it is important to inquire about disease in both 
maternal and paternal lineages because mutations can be trans¬ 
mitted through either parent. When collecting information on 
second-degree relatives, it is important to note the lineage to 
which the relative belongs (such as paternal vs maternal grand¬ 
parents) because the degree of risk might vary if affected rela¬ 
tives do not belong to the same lineage. A brief reference for 
physicians on the family medical history has been prepared by 
the American Medical Association (http://www.ama-assn.org/ 
ama/pub/category/2380.html; accessed May 29, 2008). Many 
other electronic sources and texts 55 are available. 

Family medical history information can be collected dur¬ 
ing a patient care visit or outside of the clinical encounter. 
Methods of collecting family history information outside of 
the clinical encounter can include paper questionnaires, 56 
computer questionnaires in kiosks within a clinic waiting 
area, 37 Web-based electronic collection, and interviews by 
health care professionals. The optimal means of collection 
has not been determined. 


METHODS 

Two of the authors (H.J.M. and D.R.S.) performed inde¬ 
pendent searches of the MEDLINE database for English- 
language articles dated 1966 to fune 2004 from the PubMed 
search engine. The following Medical Subject Headings were 
used: “family,” “genetic predisposition to disease,” “medical 
history taking,” “neoplasm,” and “reproducibility of results.” 
We also searched using the following textwords: “accuracy,” 
“sensitivity,” “specificity,” and “family history,” combined 
with the conditions “breast cancer,” “colon cancer,” “ovarian 
cancer,” “prostate cancer,” “endometrial cancer,” or “uterine 


cancer.” We specifically included cancers that were likely to be 
commonly encountered by primary care physicians and 
whose management might be altered according to family his¬ 
tory information. The reviewers evaluated article abstracts 
and chose studies for full-text review according to the 
abstract. We searched the bibliographies of all retrieved arti¬ 
cles to identify additional sources. 

Articles were included if they were original articles describ¬ 
ing the accuracy of the site-specific family history for the pre¬ 
specified cancers and contained a criterion standard. Studies 
presenting aggregate data (all cancer types combined into a 
single measure) for self-reported family cancer history infor¬ 
mation were excluded. For purposes of this study, the crite¬ 
rion standard for a positive family history of cancer required 
verification from the identified relative’s medical record, phy¬ 
sician, or death certificate or verification within a population 
cancer registry. For studies to be included in our analysis, 
verification of a negative family history for cancer had to 
have been performed. Thus, if a study participant reported 
that a relative had no history of breast cancer, the relative’s 
medical records, death certificate if applicable, or local cancer 
registry was examined for verification of this report. 

The completeness of case findings within tumor registries 
varied, with 83% to 99% of cancers identified through medical 



^ | Affected individual 

Cancer type 

d. Age at death 

Co Colon 


Br Breast 


St Stomach 


En Endometrial 


Figure 20-1 Hypothetical Pedigree for a Consultand With a Family 
History Suggestive of a Hereditary Nonpolyposis Colon Cancer 

A consultand is an individual under evaluation for predicting his or her 
own future risk or the risk of his or her offspring. The arrow identifies the 
consultand. The 2-letter abbreviation with number (eg, Co 53) repre¬ 
sents the diagnosis an individual received, followed by the age at the 
diagnosis. This pedigree meets Amsterdam II criteria, 14 which include 3 
relatives with a hereditary nonpolyposis colorectal cancer-associated 
tumor, such as colon cancer, endometrial cancer, ureteral cancer, cancer 
of the renal pelvis, ovarian cancer, stomach cancer, or small bowel can¬ 
cer; 1 relative who is a first-degree relative of the other 2; cancers 
affecting at least 2 generations; and 1 or more cases diagnosed before 
aged 50 years. 
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record reviews and patient interviews also being present 
within the registry. 57-61 Specific cancer sites are correctly 
recorded within the registry in 93% to 97% of cases. 
Forty-nine percent of discrepancies within tumor registries 
result from changes in an initial diagnosis with a failure to 
update registry information. 61 For breast cancer, the sensitivity 
and specificity of tumor registries are high. 62 Other tumors 
listed within registries have similar high sensitivities. 63,64 The 
National Program of Cancer Registries of the Centers for Dis¬ 
ease Control and Prevention has created a system that provides 
the rationale for accepting these data in studies that attempt 
validation of the patient’s family cancer history. 65 Although 
death certificates probably lack the accuracy of tumor regis¬ 
tries, the poorer performance of death certificates is more 
likely attributed to poor sensitivity (the death certificates do 
not record the information when in fact the decedent had can¬ 
cer). 66 According to autopsy studies, the death certificate is esti¬ 
mated to have a sensitivity of 87% for identifying cancer. 67 

We identified 22 studies from our search, using the listed 
criteria. 44,68-88 Of these, only 7 provided information on both 
the test characteristics of a positive and negative report of a 
family cancer history. 71,73,74,76,78-80 One study specifically assessed 
pedigrees suggestive of hereditary nonpolyposis colorectal 
cancer (HNPCC) 76 and was included within the analysis. Sen¬ 
sitivity and specificity were determined for the family history 
interview for HNPCC, but this information was not combined 
with other colon cancer studies. We used techniques described 
from previous Rational Clinical Examination articles to deter¬ 
mine study quality, and all 7 evaluated studies were assigned a 
quality score of C, 89 which reflects a study with an independent 
blind comparison of sign or symptom and a criterion standard 
of diagnosis among nonconsecutive patients suspected of hav¬ 
ing the target condition. 

Because the population studied could influence reporting 
accuracy, test characteristics were calculated separately for 
individuals with a personal history of cancer, as well as indi¬ 
viduals without a personal history of cancer. Sensitivity and 
specificity of patient self-report of a family history of cancer 
and likelihood ratios (LRs) of a positive or negative report 
were calculated according to raw data supplied by the origi¬ 
nal articles that met our search criteria. CIs for LRs were 
computed with previously described methods. 90 We used 
random-effects summary measures for combining the data 
because this provided broader CIs that display the uncer¬ 
tainty around the point estimates. The summary measures 
described this uncertainty better than the simple range of 
possible data from the original studies. For colon cancer, one 
study 76 was not included within the summary LRs because it 
specifically evaluated the family history for HNPCC rather 
than colon cancer in general. 

RESULTS 

Precision 

Precision reflects the reproducibility of a measurement. 
Assessing the precision of the family history interview is diffi¬ 
cult because it can be influenced by both patient and physician 


factors. Although we were unable to identify any studies 
assessing the reliability of the physician’s family cancer his¬ 
tory assessment, one study 91 in breast cancer examined the 
reliability of patient self-report. In this nested case-control 
study, 91 comparisons were made between self-reported family 
history information in women before the development of the 
disease and after the development of the disease. Follow-up 
surveys were completed 2 years after the initial survey. 
Women who had developed breast cancer, as well as those 
who had not developed breast cancer, were surveyed. The 
agreement for maternal history of breast cancer was K = 0.92 
and K = 1.0 for cases and controls, respectively; and for a his¬ 
tory of breast cancer in a sister, K = 0.65 and K = 0.88, respec¬ 
tively. Although the study did not assess whether a real 
change in family history might have occurred during the 
study period, these results suggest that self-reported family 
breast cancer history is probably only slightly influenced by 
recall bias. Patient precision regarding the family history 
interview for other cancers has not been reported. 

Accuracy 

Accuracy represents how well a particular test measures the 
value it is intending to measure. Seven studies concerning fam¬ 
ily cancer history were ultimately included in this analysis 
(Table 20-1). 71, 73,74,76,78-80 Three studies collected family history 
from personal patient interviews, 73,74,79 whereas the other 4 
relied on a self-completed survey. 71,76,78,80 Four studies solely 
relied on cancer registry data as their criterion standard 73,74,79,80 ; 
2 studies used a combination of medical records and death 
certificates, 71,78 whereas the remaining study used all 3 sources 
as its criterion standard. 76 Only information for first-degree 
relatives was extracted. 

For individuals with cancer (Table 20-2), the positive likeli¬ 
hood ratio (LR+) and negative likelihood ratio (LR-) of a self- 
reported family cancer history in a first-degree relative were 23 
(95% Cl, 8.1-64) and 0.29 (95% Cl, 0.13-0.67) for colon, 41 
(95% Cl, 23-75) and 0.07 (95% Cl, 0.03-0.13) for breast, 20 
(95% Cl, 4.3-89) and 0.55 (95% Cl, 0.35-0.86) for endome¬ 
trial, 44 (95% Cl, 15-132) and 0.21 (95% Cl, 0.12-0.37) for 
ovarian, and 24 (95% Cl, 2.3-262) and 0.25 (95% Cl, 
0.16-0.39) for prostate cancers, respectively. For patients with¬ 
out a personal history of cancer (Table 20-3), the LR+ and LR- 
of a family history for the following cancers in a first-degree 
relative were 23 (95% Cl, 6.4-81) and 0.25 (95% Cl, 0.10-0.63) 
for colon, 8.9 (95% Cl, 5.4-15) and 0.20 (95% Cl, 0.08-0.49) 
for breast, 14 (95% Cl, 2.2-83) and 0.68 (95% Cl, 0.31-1.5) for 
endometrial, 34 (95% Cl, 5.7-202) and 0.51 (95% Cl, 
0.13-2.1) for ovarian, and 12 (95% Cl, 6.5-24) and 0.32 (95% 
Cl, 0.18-0.55) for prostate cancers, respectively. The estimates 
for sensitivity, specificity, and LRs for unaffected individuals 
for prostate, breast, endometrial, and ovarian cancers are 
based on data from a single study by Kerber and Slattery. 74 

Of the remaining 15 studies, we excluded 8 studies 81-88 
because the tumor data were presented in aggregate (ie, 
family history of any cancer) or were unclear; therefore, we 
were unable to extrapolate site-specific numbers. Seven 
studies evaluated only the positive predictive value of self- 
reported family history information for breast, colon, ovarian, 
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Table 20-1 Characteristics of Included Studies of Patient Report of a Family History of Cancer in a First-Degree Relative 3 



Cancer Site 

Method of Family History 
Information Collection 


Source, y 

Affected Individuals 

Unaffected Individuals 

Criterion Standard 

Love et al, 68 1985 

Colon, breast 


Personal interview 

Medical records, death certificate 

Breuer et al, 69 1993 

Breast 


Self-completed survey 

Medical records 

Theis et al, 70 1994 

Colon, prostate, breast, ovarian 


Personal interview and self- 
completed survey 

Medical records, death certificate, 
cancer registry 

Aitken et al, 71 1995 


Colon 

Self-completed survey 

Medical records, death certificate 

Parent et al, 72 1995 

Breast 

Breast 

Personal interview 

Medical records 

Anton-Culver et al, 73 1996 

Breast 


Personal interview 

Cancer registry 

Kerber and Slattery, 74 1997 

Colon, prostate, breast, 
endometrial, ovarian 

Colon, prostate, breast, 
endometrial, ovarian 

Personal interview 

Cancer registry 

Sijmons et al, 44 2000 

Colon, breast, ovarian 


Personal interview and self- 
completed survey 

Medical records, death certificate 

Eerola et al, 75 2000 

Breast 


Personal interview and self- 
completed survey 

Medical records, cancer registry 

Katballe et al, 76 2001 

Colon 


Self-completed survey 

Medical records, death certificate, 
cancer registry 

King et al, 77 2002 

Colon, prostate, breast, 
endometrial, ovarian 


Personal interview 

Medical records, death certificate 

Ziogas and Anton-Culver, 78 2003 

Colon, prostate, breast, 
endometrial, ovarian 


Self-completed survey 

Medical records, death certificate 

Mitchell et al, 79 2004 

Colon 

Colon 

Personal interview 

Cancer registry 

Verkooijen et al, 80 2004 

Breast, ovarian 


Self-completed survey 

Cancer registry 


“Affected individuals are patients who have a personal diagnosis of cancer and unaffected individuals are patients with no personal diagnosis of cancer. 


Table 20-2 Studies Evaluating Both Sensitivity and Specificity of Patient Report of a Family History of Cancer in a First-Degree Relative in 
Individuals With Cancer 


No. of Patients/Total (%) LR (95% Cl) 


Source, y 

Cancer Type 

Sensitivity 

Specificity 

Positive 

Negative 

Kerber and Slattery, 74 1997 

Colon 

11/17(65) 

98/108 (91) 

6.9(3.5-13) 

0.39 (0.20-0.74) 

Katballe et al, 76 2001“ 

Colon 

11/18(61) 

66/69 (96) 

14(4.4-45) 

0.41 (0.23-0.73) 

Ziogas and Anton-Culver, 78 2003 

Colon 

174/194(90) 

1454/1498 (97) 

31 (23-41) 

0.11 (0.07-0.16) 

Mitchell et al, 79 2004 

Colon 

30/53 (57) 

1256/1269 (99) 

55 (31-100) 

0.44 (0.32-0.60) 

Summary 8 




23(8.1-64) 

0.29(0.13-0.67) 

Kerber and Slattery, 74 1997 

Prostate 

11/16(69) 

101/109 (93) 

9.4 (4.5-20) 

0.34(0.16-0.70) 

Ziogas and Anton-Culver, 78 2003 

Prostate 

46/58 (79) 

557/564 (99) 

64 (30-135) 

0.21 (0.13-0.35) 

Summary 




24 (2.3-262) 

0.25(0.16-0.39) 

Anton-Culver et al, 73 1996 

Breast 

54/59 (92) 

364/370 (98) 

56 (25-125) 

0.09 (0.04-0.20) 

Kerber and Slattery, 74 1997 

Breast 

11/13(85) 

107/112(96) 

19(7.8-46) 

0.16(0.05-0.58) 

Verkooijen et al, 80 2004 

Breast 

60/61 (98) 

247/249 (99) 

122(31-487) 

0.02 (0-0.12) 

Ziogas and Anton-Culver, 78 2003 

Breast 

188/197(95) 

850/873 (97) 

36 (24-54) 

0.05 (0.03-0.09) 

Summary 




41 (23-75) 

0.07(0.03-0.13) 

Kerber and Slattery, 74 1997 

Endometrial 

2/7 (29) 

114/118(97) 

8.4 (1.9-38) 

0.74(0.46-1.2) 

Ziogas and Anton-Culver, 78 2003 

Endometrial 

10/18(56) 

1035/1052 (98) 

34 (18-64) 

0.45 (0.27-0.76) 

Summary 




20 (4.3-89) 

0.55 (0.35-0.86) 

Kerber and Slattery, 74 1997 

Ovarian 

2/3 (67) 

117/122 (96) 

16(5.0-53) 

0.35(0.07-1.7) 

Verkooijen et al, 80 2004 

Ovarian 

4/6 (67) 

168/170 (99) 

57 (13-251) 

0.34(0.11-1.0) 

Ziogas and Anton-Culver, 78 2003 

Ovarian 

35/42 (83) 

1017/1028 (99) 

78 (43-142) 

0.17(0.09-0.33) 

Summary 




44 (15-132) 

0.21 (0.12-0.37) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Sensitivity and specificity of a patient having a high-risk colon cancer pedigree according to Amsterdam II criteria. 14 
“Composite does not include data from Katballe et al 76 (see "Methods”). 
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Table 20-3 Studies Evaluating Both Sensitivity and Specificity of Patient Report of a Family History of Cancer in a First-Degree 
Relative in Healthy Individuals 


No. of Patients/Total (%) LR (95% Cl) 


Source, y 

Cancer Type 

Sensitivity 

Specificity 

Positive 

Negative 

Aitken etal, 71 1995 

Colon 

70/81 (86) 

219/239(92) 

10(6.7-16) 

0.15(0.09-0.26) 

Kerber and Slattery, 74 1997 

Colon 

13/16(81) 

178/190(94) 

13(7.1-23) 

0.20 (0.07-0.56) 

Mitchell et al, 79 2004 

Colon 

9/17(53) 

1015/1020(99) 

108(40-288) 

0.47 (0.29-0.78) 

Summary 




23 (6.4-81) 

0.25(0.10-0.63) 

Kerber and Slattery, 74 1997 

Prostate 

21/30(70) 

166/176(94) 

12(6.5-24) 

0.32 (0.18-0.55) 

Kerber and Slattery, 74 1997 

Breast 

18/22 (82) 

167/184(91) 

8.9(5.4-15) 

0.20 (0.08-0.49) 

Kerber and Slattery, 74 1997 

Endometrial 

1/3 (33) 

198/203(98) 

14(2.2-83) 

0.68 (0.31-1.5) 

Kerber and Slattery, 74 1997 

Ovarian 

1/2 (50) 

201/204(99) 

34 (5.7-202) 

0.51 (0.13-2.1) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Box 20-2 Selected Web Sites for Cancer Risk Calculators 3 
METHODS FOR ESTIMATING CANCER RISK 

Various cancer sites: 
http://www.yourdiseaserisk.wustl.edu/ 

Breast Cancer Risk Assessment tools: 

http://www.cancer.gov/bcrisktool/; http://www.halls.md/ 
breast/risk.htm 

METHODS FOR ESTIMATING THE LIKELIHOOD OF A BRCA MUTATION 

BRCAPRO statistical model: 
http://astor.som.jhmi.edu/BayesMendel/ 

Mutation prevalence tables: 

http://www.myriadtests.com/provider/brca-mutation- 

prevalence.htm 

“Accessed May 29, 2008. 


prostate, and endometrial cancers (Table 20-4). Positive pre¬ 
dictive values tended to be better in articles concerning first- 
degree relatives compared with second-degree relatives. Indi¬ 
viduals with personal histories of cancer tended to report 
family histories with a greater positive predictive value, 
although the number of studies evaluating unaffected indi¬ 
viduals was limited. 

Common Reasons for False-Positive 
or False-Negative Reports 

In cancers in which patients are likely to be accurate in their 
report, such as breast cancer, case reports have indicated that 
false-positive reports are associated with malingering, prob¬ 
lems with patient-physician communication, or history of 
benign breast disease being reported as malignant. 36 Other 
common reasons for false-positive reports of family cancer 
history result from confusion based on primary vs metastatic 
disease. 68,92 This confusion has been described with false 
reports of primary liver cancer, as well as central nervous system 


cancers. Cancers that are frequently overreported include mel¬ 
anoma, which is incorrectly reported in almost half the 
reports, 93 and noncolonic gastrointestinal malignancies. 35 

Several factors relate to a false-negative report of a family 
history of cancer. In one study, 94 older patients and nonwhite 
respondents were more likely to underreport a family history 
of cancer. Another study 74 demonstrated that older patients 
were more likely to falsely report a negative family history of 
cancer, whereas patient sex and education level have little 
effect on the accuracy of reporting. Specific cancers with high 
rates of false-negative reporting include central nervous sys¬ 
tem tumors and hematologic malignancies. 94 

Other Means for Collecting Family History Information 
and Ways to Improve Family History Data Collection 

Several barriers exist for the collection of family history infor¬ 
mation. Patient-specific factors that might result in poor pedi¬ 
gree collection include poor family communication, family 
myths, or individual spiritual beliefs. For physicians, probably 
the most significant barrier is time. Although a comprehensive 
family history assessment can take 15 to 30 minutes, 95 the aver¬ 
age primary care visit lasts only 16 minutes. 96 Several alterna¬ 
tive methods that involve collecting this information outside 
the context of the clinical visit may facilitate the collection of 
family history information. These other methods include self- 
completed patient paper surveys, computer-based tools, and 
personal visits arranged solely for pedigree collection. 

Family history questionnaires offered outside of a clinical 
visit confer several theoretic advantages to visit-based pedi¬ 
gree assessment. 97 Besides saving clinic time, patients can 
consult with family members to check the accuracy of the 
information, which can then be reviewed and integrated into 
a clinic appointment when relevant. The data from a ques¬ 
tionnaire developed in Switzerland compared with informa¬ 
tion found within 2 population-based cancer registries 
exhibited sensitivities of 74% and 85% and specificities of 
97%. 56 Family history assessment tools (Box 20-2) have also 
been developed to assist physicians in determining which 
individuals might be candidates for genetic testing. 98 
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Table 20-4 Predictive Value of a Positive Report of a Family History of Cancer in a First-Degree or Second-Degree Relative 




Positive Predictive Value, No. of Patients/Total (%) 



Cancer Cases 

Healthy Controls 

Source, y 

Cancer Type 

First-Degree Relative 

Second-Degree Relative 

First-Degree Relative Second-Degree Relative 

Kerber and Slattery, 74 1997 

Colon 

11/21 (52) 


13/25(52) 

Mitchell et al, 79 2004 

Colon 

33/43 (70) 

13/22 (62) 

9/14(63) 10/16(63) 

King et al, 77 2002 

Colon 

22/24 (92) 



Love et al, 68 1985 

Colon 

39/42 (93) 

31/37 (84) 


Sijmons et al, 44 2000 

Colon 

30/33(91) 

15/15(100) 


Theis et al, 70 1994 

Colon 

13/14(93) 

21/29 (72) 


Ziogas and Anton-Culver, 78 2003 

Colon 

174/218(80) 

52/70 (74) 


Aitken et al, 71 1995 

Colon 



70/90 (78) 

Summary 3 


81 (77-85) 

77 (70-83) 

71 (63-78) 63 (39-82) 

Kerber and Slattery, 74 1997 

Prostate 

11/19(58) 


21/31 (68) 

Ziogas and Anton-Culver, 78 2003 

Prostate 

46/53 (87) 

30/40 (75) 


Theis et al, 70 1994 

Prostate 

11/13(85) 

11/11 (100) 


King et al, 77 2002 

Prostate 

25/29 (86) 



Summary 3 


85 (78-90) 

80 (67-89) 

68 (50-82) 

Kerber and Slattery, 74 1997 

Breast 

11/16(69) 


18/35(51) 

Parent et al, 72 1995 

Breast 

67/74(91) 


33/34 (97) 

Ziogas and Anton-Culver, 78 2003 

Breast 

188/211 (89) 

103/115(90) 


Eerola et al, 75 2000 

Breast 

94/99 (95) 

109/114(96) 


Sijmons et al, 44 2000 

Breast 

65/69 (94) 

28/31 (90) 


Theis et al, 70 1994 

Breast 

166/167(99) 

33/39 (85) 


Love et al, 68 1985 

Breast 

78/83 (94) 

65/74 (88) 


Anton-Culver et al, 73 1996 

Breast 

54/60 (90) 



Verkooijen et al, 80 2004 

Breast 

60/62 (97) 



King et al, 77 2002 

Breast 

38/40 (95) 



Breuer et al, 69 1993 

Breast 

84/94 (89) 



Summary 3 


93 (91-94) 

91 (88-94) 

74 (63-83) 

Kerber and Slattery, 74 1997 

Endometrial 

2/6 (33) 


1/6(17) 

Ziogas and Anton-Culver, 78 2003 

Endometrial 

10/27(37) 

3/14(21) 


King et al, 77 2002 

Endometrial 

2/5 (40) 



Summary 3 


37 (24-53) 

21 (7-47) 

17(3-57) 

Kerber and Slattery, 74 1997 

Ovarian 

2/7 (28) 


1/4(25) 

Ziogas and Anton-Culver, 78 2003 

Ovarian 

35/46 (76) 

15/24(63) 


Sijmons et al, 44 2000 

Ovarian 

10/15(67) 



Verkooijen et al, 80 2004 

Ovarian 

4/6 (67) 



Theis et al, 70 1994 

Ovarian 

2/2(100) 



King et al, 77 2002 

Ovarian 

2/4 (50) 



Summary 3 


69 (58-78) 

63 (43-79) 

25 (5-70) 


“Summary data are presented as likelihood ratio (95% confidence interval). 


Computerized genograms can also be effective and conve¬ 
nient tools for both patients and physicians. 99101 These tools 
offer the benefits of paper-based systems and, through clinical 
decision support, educate patients and offer guidance to physi¬ 
cians. 102 ' 104 Sweet et al 37 compared family history information 
obtained by physicians at a comprehensive cancer clinic with 
those directly entered by patients into a computer program. 
Patients were then determined to be “high risk” for cancer 


according to pedigree information collected from either the 
computer program or information recorded within the medical 
record. Of 362 computer entries, 69% had some form of family 
history information recorded within their medical record. A 
total of 101 patients were considered high risk according to their 
pedigree information collected from the computer program, 
but only 69 of these patients had information recorded within 
their medical record to confirm this high risk. 
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Table 20-5 Posttest Probabilities of Having a Family History of 
Cancer 3 

Estimated Posttest Probability of 

Cancer Family Having a Family Cancer 



History 
Prevalence 
(Pretest 
Probability), % 

Personal 
History of 
Cancer 

History in a First-Degree 
Relative, LR (95% Cl) 

Cancer Type 

Positive 

Negative 

Colon 8 

9.4 

Yes 

70 (46-87) 

3(1-7) 



No 

70 (40-89) 

2.5(1.0-6.0) 

Breast 48 

22 

Yes 

92 (90-94) 

1.0 (0.8-3.0) 



No 

71 (60-81) 

5(2-12) 

Ovarian 48 

3.5 

Yes 

68 (57-77) 

07(0.3-1.0) 



No 

55 (17-88) 

2.0 (0.5-7.0) 

Endometrial 46 

7.8 

Yes 

70 (56-80) 

4 (3-6) 



No 

54 (16-88) 

5(3-11) 

Prostate 11 

9.5 

Yes 

72 (24-97) 

3 (2-4) 



No 

56(41-72) 

3 (2-5) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Posttest probability = {[(pretest probability/1 - pretest probability) LR]/[1 + (pretest 
probability/1 - pretest probability) LR]). 


Special visits outside of the clinical encounter have also been 
evaluated as a means to obtain family history information. In 
one study, 105 patients observed at a single primary care practice 
were invited to a special visit designed to collect detailed family 
history information. Ten percent of patients observed in the 
pedigree clinic had a family history of cancer (breast, colon, 
melanoma, or thyroid) and some patients were referred for fur¬ 
ther care according to their pedigree. Patients were less anxious 
about their family history after the special visit, but this effect 
was not sustained beyond 12 weeks. A major limitation of the 
study was the poor attendance to the special clinic; only 16% of 
invited patients attended. 


CLINICAL SCENARIO—RESOLUTION 


This patient’s pretest probability of having a positive fam¬ 
ily history of breast cancer before her clinical interview 
can be estimated at 22%. 48 After consideration of her self- 
reported pedigree, her posttest probability for a family 
history of breast cancer increased to 71% (95% Cl, 
60%-81%) (Table 20-5). According to the family history 
information presented, it is likely that this woman does in 
fact have a positive family history for breast cancer and 
that seeking confirmatory evidence is unlikely to offer any 
additional gain. This patient has 2 relatives affected with 
breast cancer, both diagnosed before 50 years and one a 
first-degree relative. According to online mutation preva¬ 
lence tables (Box 20-2), this woman’s risk estimate of har¬ 
boring a deleterious mutation for either BRCA1 or 
BRCA2 was 10%, and a referral was recommended to a 
specialized cancer risk assessment program for counsel¬ 
ing on genetic testing. 106 During pretesting counseling, 
she learned of the potential limitations of testing on a 
presymptomatic individual without a known deleterious 


mutation. The genetic counselor suggested testing one of her 
affected relatives first to produce a more informative test. The 
patient said that her maternal grandmother is no longer alive, 
and she is not particularly close to her mother and unsure 
whether her mother would be willing to undergo testing. 
After pretesting genetic counseling, the patient decided not to 
be tested for a BRCA1 or BRCA2 mutation and planned to 
broach the subject of testing with her mother. According to 
this patient’s family history, clinical breast examinations every 
6 months are prescribed and the patient is shown how to per¬ 
form monthly self-breast examinations. Mammography 
screening is initiated and an appointment is scheduled. 

THE BOTTOM LINE 

Family history assessment is taking on greater importance as 
high-risk individuals are being offered earlier screening 
interventions and risk-reducing therapies. Cancer family his¬ 
tories acquired on first-degree relatives for breast and colon 
cancer are likely to represent true positives and true negatives 
for the disease and may not require further evaluation to 
substantiate. However, other cancers with a familial disposi¬ 
tion are less accurately reported. 
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CLINICAL SCENARIO 


A 48-year-old woman makes an urgent appointment to see 
you. She is distressed because her 52-year-old sister just 
returned home from an outpatient colonoscopy procedure 
and called to tell her that she has cancer. Your patient is 
healthy, has a normal well-balanced diet, and has no abnor¬ 
mal bowel symptoms. She wants to know what she should do. 

UPDATED SUMMARY ON FAMILY HISTORY OF CANCER 

Original Review 

Murff HJ, Spigel DR, Syngal S. Does this patient have a family 
history of cancer? an evidence-based analysis of the accuracy 
of family cancer history. JAMA. 2004;292( 12): 1480-1489. 

UPDATED LITERATURE SEARCH 

Our literature search replicated the search strategy reported in the 
original article, limited to 2004-2006. The results yielded 32 titles, 
for which we reviewed the abstracts. None of these studies 
reported both the sensitivity and specificity of the family history 
for cancer as obtained from healthy individuals in the clinic office 
setting. One study evaluated the sensitivity of the family history 
among patients who had a first-degree relative with either the Li- 
Fraumeni syndrome or the breast-ovarian cancer syndrome. 1 

NEW FINDINGS 

Details of the Update 

No new studies assessed the accuracy of the family medical his¬ 
tory in an unselected general medical population. 

CHANGES IN THE REFERENCE STANDARD 

None. 

RESULTS OF LITERATURE REVIEW 

A study of patients in a genetic screening clinic because 
they have a first-degree relative with a breast cancer syn¬ 
drome provides some insight into the factors that might 


affect the accuracy of a family history of carcinoma. 1 The 
accuracy of the patient’s report depended on the actual 
genetic syndrome. Perhaps, not surprisingly, for 2 breast 
cancer syndromes the history from female first-degree rel¬ 
atives was more accurate than the family history elicited 
from male first-degree relatives. Those with a college edu¬ 
cation were more accurate than less-educated persons; 
first-degree relatives of the affected individual were more 
accurate than second-degree relatives; however, age did 
not affect the accuracy. Given the select population for 
this study, we do not know whether these factors general¬ 
ize to other populations. The higher specificity of the family 
history reported by women was validated in a population- 
based sample of patients. 2 However, the same population- 
based study found higher specificity for family histories 
reported by younger (<50 years) patients and no differ¬ 
ence as a function of the consultand or maternal or pater¬ 
nal level of education. 


EVIDENCE FROM GUIDELINES 

All standard physical examination and clinical textbooks rec¬ 
ommend that clinicians elicit a family history. Guidelines for 
specific cancers depend on accurate family histories. 3 


CLINICAL SCENARIO—RESOLUTION 


Typically, screening for colon cancer begins at aged 50 years. 
However, you might obtain a colonoscopy now, depending on 
the family history. This case scenario highlights the impor¬ 
tance of confirming the medical history. Had the situation 
been different in that the patient’s sister called a week after the 
colonoscopy with the report of cancer, the likelihood is high 
(likelihood ratio, 23) that your patient’s report of a family his¬ 
tory of colon cancer would be accurate. Although it is certainly 
possible that the sister’s physician told her she had carcinoma 
from the colonoscopic findings, it would be prudent to wait 
for confirmation. You should explain to your patient that it is 
important for you both to understand the exact colonoscopy 
results (eg, the presence of multiple polyps) and the biopsy 
results (to confirm the presence of cancer). Once you have 
those findings, you can discuss with the patient the appropri¬ 
ate timing and approach to screening for colon cancer. 
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FAMILY HISTORY OF CANCER—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The prior probability of a family history of any carcinoma 
depends on the specific cancer. The general rates are as shown 

in Table 20-6. 


Table 20-6 Prevalence of Family History of Some Common Cancers 

Cancer 

Family History Prevalence, % 

Breast 

5-22 

Colon 

2-9.4 

Ovarian 

1.1-3.5 

Endometrial 

0.5-1.4 

Prostate 

4.6-9.5 


POPULATION FOR WHOM A FAMILY HISTORY 
OF CANCER SHOULD BE CONSIDERED 

A family history that addresses cancer should be obtained from 
all patients. However, the field of genetics and personal risk 
assessment is changing rapidly, and physicians will need to get 
further education based on new data that describe a myriad of 
genetic associations with cancer. Online assessment tools can 
help patient assess their individual risk (http://www.your 
diseaserisk.wustl.edu; accessed May 29, 2008). The BRCA 
mutation, a particularly strong risk factor for breast or ovarian 
cancer, has specific online resources for assessing risk, although 
all risk assessments depend on accurate information from the 
patient (http://astor.som.jhmi.edu/BayesMendel/ or http:// 
www.myriadtests.com/provider/brca-mutation-prevalence. 
htm; accessed May 29,2008). 

DETECTING THE LIKELIHOOD OF A 
FIRST-DEGREE RELATIVE WITH CANCER 

A healthy patient who reports no family history of cancer will 
most likely be correct. However, even among patients with a per¬ 
sonal history of cancer, the accuracy of a positive report of cancer 
in first-degree relatives may sometimes require confirmation, 
depending on the specific surveillance or genetic screening plan 
(see Tables 20-7 and 20-8). 


Table 20-7 Likelihood Ratio of a Healthy Patient’s Reported Family 
History for Cancer 

Family History of 
Carcinoma 

Healthy Patients 

LR+ (95% Cl) 

LR- (95% Cl) 

Ovarian 

34 (5.7-202) 

0.51 (0.13-2.1) 

Colon 

23(6.4-81) 

0.25(0.10-0.63) 

Endometrial 

14(2.2-83) 

0.68(0.31-1.5) 

Prostate 

12(6.5-24) 

0.32(0.18-0.63) 

Breast 

8.9 (5.4-15) 

0.20 (0.08-0.49) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

Table 20-8 Likelihood Ratio of an Affected Patient’s Reported Family 
History for Cancer 

Family History of 
Carcinoma 

Patient With Personal History of Cancer 

LR+ (95% Cl) 

LR- (95% Cl) 

Ovarian 

44 (15-132) 

0.21 (0.12-0.37) 

Colon 

23(8.1-64) 

0.29(0.13-0.67) 

Endometrial 

20 (4.3-89) 

0.55 (0.35-0.86) 

Prostate 

24 (2.3-262) 

0.25(0.16-0.39) 

Breast 

41 (23-75) 

0.07(0.03-0.13) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio. 


REFERENCE STANDARD TESTS 

Verification of cancer from the first-degree relatives medical 
record, physician, population cancer registry, or autopsy. 
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CHAPTER 


CLINICAL SCENARIOS 


Does This Patient Have a 

Goiter? 

Kerry Siminoski, MD 


How Large Are These Thyroid Glands? 

For each of the following patients, assessment of thyroid 
size is an important part of the clinical examination. In 
case 1, a 32-year-old woman presents with symptoms and 
findings consistent with hyperthyroidism, but she has no 
exophthalmos and has always been anxious. In case 2, a 
55-year-old man has a diagnosis of Graves disease, and the 
choice is made for radioactive iodine ablation therapy. In 
case 3, a 64-year-old man has a goiter that causes discom¬ 
fort on swallowing, and thyroxine is administered in an 
attempt to shrink the thyroid gland. 


WHY ASSESS THE THYROID GLAND FOR SIZE? 


A goiter is simply an enlargement of the thyroid gland and 
may result from hormonal or immunologic stimulation of 
gland growth or the presence of inflammatory, proliferative, 
infiltrative, or metabolic disorders (Table 21-1). A common 
error among clinicians first learning about the thyroid is to 
associate thyroid size with function; a goiter, however, can be 
present in hyperthyroidism, hypothyroidism, or a euthyroid 
state. Determining whether a thyroid is enlarged can aid in 
diagnosis, differential diagnosis, and decisions about labora¬ 
tory testing; in determining specific therapy and therapeutic 
dosing; and subsequently in monitoring of the clinical 
course. For example, when a patient presents with symptoms 
that could be caused by hyperthyroidism, the detection of a 
goiter increases the likelihood that thyrotoxicosis is present. 2 
If the patient described in the first case had an enlarged thy¬ 
roid, hyperthyroidism would be a likely diagnosis. 2 On the 
other hand, if her gland were of normal size, anxiety might 
be the explanation for her symptoms. Determination of thy¬ 
roid size also is useful once a specific disease is diagnosed. In 
patients with Graves disease, for example, thyroid size may 
be a factor in determining choice of treatment because 
patients with smaller glands are more likely to go into immu¬ 
nologic remission during antithyroid drug therapy. 3 If radio¬ 
iodine is the chosen treatment, as in the second case, the size 
of the gland is often used in calculating the dose to be admin¬ 
istered. 4 Finally, responses to various therapies can be moni¬ 
tored clinically by assessing thyroid size, such as the attempt 
to shrink a large goiter with thyroid hormone administration 
in the third case. 5 


THE ANATOMIC BASIS OF THYROID EXAMINATION 

Landmarks and Relation to Other Structures 

The thyroid gland is located in the anterior neck and usually 
consists of 2 lobes connected at their lower midregions by a 
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transverse isthmus (Figure 211). The most prominent struc¬ 
ture in the anterior neck is the thyroid cartilage. Inferior to 
the thyroid cartilage lies the cricoid cartilage, and inferior to 
this lies the isthmus of the thyroid gland, which can be as low 
as the level of the fourth tracheal ring. Each thyroid lobe lies 
against the sides of the trachea, extending up from the isth¬ 
mus to the region of the cricoid and thyroid cartilages and 
downward toward the clavicles. The posterior portion of 
each lobe lies beneath the belly of the ipsilateral sternocleido- 


Table 21 -1 Conditions That May Present With an Enlarged Thyroid Gland 3 

Endemic/iodine deficiency goiter 

Multinodular goiter 

Graves disease 

Hashimoto thyroiditis 

Subacute thyroiditis 

Painless/postpartum thyroiditis 

Familial goiter 

Malignancy 

Goitrogens 

Iodine excess 

“Adapted from Eastham. 1 



Figure 21-1 The Location of the Thyroid Gland in Relationship to 
Nearby Structures 


mastoid muscle. Because the fascial envelope of the thyroid 
gland is continuous with the pretracheal fascia of the cricoid 
cartilage and hyoid bone, the thyroid ascends and descends 
with the laryngeal structures during swallowing. 

How Large Is the Normal Thyroid? 

The normal thyroid size for a population is largely determined 
by the supply of iodine in the diet, with a tendency to larger 
glands in iodine-deficient areas. 6 ' 8 Consequently, studies of 
clinically normal thyroid glands have demonstrated sizes that 
span an extreme range in euthyroid individuals, differing by 
geographic location and varying through time within a given 
region as iodine supplementation has been instituted. Until 
the middle of this century, most authors considered a typical 
thyroid gland to be about 20-25 g, and a commonly accepted 
upper normal size was 35 g. 811 More recent studies in iodine- 
supplemented populations have reported mean weights of 10 g 
or less and an upper normal size of 20 g. 12,13 Although a value 
of 35 g may still apply in iodine-deficient areas, an upper nor¬ 
mal weight of 20 g is probably appropriate for most parts of 
the western world and will be used for this analysis. With this 
definition, the prevalence of goiter is typically 2% to 5% in 
iodine-replete regions. 13,14 

HOW TO EXAMINE THE THYROID 
GLAND TO DETERMINE SIZE 

The normal thyroid is rarely visible because of its relatively 
small size, partial concealment by the sternocleidomastoids, 
and soft texture, and it may be marginally palpable. 5,9,15 
Enlargement is initially observed as an increase in the size of 
the lateral lobes to palpation. 5,8 Further growth results in a 
gland visible in the anterior side of the neck that can be seen 
when inspecting from the side 16,17 and from the front with the 
patient’s neck extended. 5,7,15,18 With increasing size, the gland 
becomes even more prominent on inspection from the side 
and it becomes visible from the front with the patient’s head 
in a normal position. Ultimately, a large goiter is easily palpa¬ 
ble, has prominence from the side of greater than 1 cm, and 
is visible from the front at a distance. 5,17,18 

As a result of observations on these patterns of enlarge¬ 
ment, various systems have been described to size a thyroid 
gland according to (1) the estimated weight 19 ' 21 ; (2) the vol¬ 
ume relative to the size of normal glands 5,8 ; (3) the presence 
or absence of palpable or visible enlargement 8,18,22 ; (4) the 
degree of visible prominence when the neck is viewed 
laterally 17 ; (5) neck circumference determined by tape 
measure 23,24 ; (6) the surface area of the gland projected onto 
the skin 22,25 ; and (7) the maximum width of the lower poles, 
measured with a ruler or calipers. 26 Many of these rating 
scales were developed for epidemiologic studies of goiter in 
endemic areas and were intended to classify significant goi¬ 
ters rapidly (with examination time in some studies averag¬ 
ing only 18 seconds per subject). 5 As a result, many are of 
little use for the smaller thyroid glands observed in regions 
without significant levels of endemic goiter. Most studies 
from which data for accuracy and precision of goiter deter- 
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mination can be derived do not report specifics of thyroid 
examination technique. Consequently, there is no objective 
evidence to support the use of one examination method over 
another. 23 ‘ 25 - 27 - 28 Many of the variations are minor, so shared 
features will be described. 

The patient should be comfortably positioned, either 
standing or seated, with the neck in a neutral position or 
slightly extended. The region of the neck below the thyroid 
or cricoid cartilage should be observed from the front, with 
good cross-lighting to accentuate shadows and highlight 
masses. If an abnormality is suspected, the neck should be 
moved as appropriate to alter the prominence of the area 
under suspicion. A particularly useful maneuver is inspection 
during full extension of the patient’s neck. This position 
stretches superficial tissues over the thyroid gland, which is 
pressed against the relatively unyielding trachea, and enhances 
visibility of the gland. Inspection of the neck from the side, 
looking for a prominence protruding from the normally 
smooth and straight contour between the cricoid cartilage 
and the suprasternal notch, can reveal enlargement. 17 The 
amount of prominence should be measured with a ruler 
(Figure 21-2). This method requires a certain degree of 
guesswork in deducing where the normal neck contour 
would lie, but the measurement can provide information 
useful for ruling in the presence of a goiter. There is no par¬ 
ticular spot to place the ruler; it merely serves as a visual 
guide in estimating the degree of protrusion. 

After inspection, the gland is palpated, and this is where 
the greatest differences in methods arise. Clinician preference 
varies about palpation with fingers or thumbs, an approach 
from the front or from behind the patient, and whether each 
lobe is palpated by the ipsilateral hand or the opposite hand. 
In the absence of data to support a specific method, though, 
examiners should use the approach with which they are most 
comfortable. Regardless of the technique used, it is often use¬ 
ful to first attempt to locate the thyroid isthmus by palpating 
between the cricoid cartilage and suprasternal notch. An 
isthmus may not be felt, but if it is, this can help locate the 
gland. When palpating the lobes, it is beneficial to relax the 
sternocleidomastoids. To better feel the left lobe, for exam¬ 
ple, the neck can be slightly flexed and rotated to the left to 
relax the left sternocleidomastoid and to make space for the 
palpating fingers or thumb between the sternocleidomastoid 
and trachea. There are certain additional maneuvers that 
may be useful, such as measuring neck circumference or the 
dimensions of a lobe with calipers, but no information is 
available to assess accuracy or precision of these techniques. 
Other elements of the thyroid examination that are carried 
out concomitantly with size assessment include determining 
gland texture, gland mobility, tenderness, and the presence of 
nodularity. Auscultation also may be performed for the pres¬ 
ence of bruits. These features have their own implications 
but are not central to determining the presence of a goiter 
and so are beyond the scope of this discussion. If no thyroid 
is detected in the neck, it may be maldescended or intratho- 
racic. Methods of examining for these variants will not be 
discussed here, because, again, no information is available to 
analyze the reported techniques. 


Dogma holds that the thyroid examination is improved by 
having the patient swallow during both inspection and pal¬ 
pation. Indeed, it has been stated that swallowing increases 
sensitivity of inspection alone to that of inspection combined 
with palpation. 28 No study, however, has actually analyzed 
whether a swallowing maneuver is of benefit, although most 
examiners believe it is. The movement resulting from swal¬ 
lowing accomplishes several things. First, it changes the 
shadowing of any mass, enhancing visual detection of a bulge 
in the neck contour that may be too subtle to be detected 
otherwise. Second, movement of the thyroid raises a low- 
placed gland up from below the sternal notch or lower ster¬ 
nocleidomastoid, making it accessible when it may not have 
been so previously. Third, as in any palpation technique, 
movement of the object against the palpating hand increases 
definition. Finally, because only the larynx, upper trachea, 
and thyroid gland move with swallowing, this maneuver can 
aid in anatomic localization. 29 The degree of excursion of the 
thyroid on swallowing is proportional to the size of the bolus 
swallowed, so the patient should be given a sip of water. 30 

When the thyroid is examined to determine the presence 
of a goiter, the goal is to estimate gland size. Most endocri¬ 
nologists express findings in absolute mass or as relative to an 
upper limit of a normal-sized gland, such as “normal” or “2 
to 3 times normal size.” Many nonendocrinologists have 
some difficulty quantifying thyroid mass, but this ability is 
crucial in accurately classifying a gland, as will be discussed 
in the analysis of accuracy. 


FALSE-POSITIVE AND FALSE-NEGATIVE 
GOITER RESULTS 

Finding a goiter when one is not present may simply be an 
error in detection. There are, however, several common 
causes of a false-positive goiter or pseudogoiter finding. 
One is simply an easily palpable gland in a thin individual. 5 



Figure 21 -2 Estimating Lateral Thyroid Prominence 

When viewed from the side, the normal contour of the thyroid gland is invisi¬ 
ble. Enlargement up and out leads to a prominent and visible gland. 
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Table 21 -2 Interobserver Precision in Assessment of Thyroid Size or 
Presence of Goiter 


Agreement k 


Reference 

All 

Categories 3 

Goiter Only 5 

All 

Categories 3 

Goiter Only 5 

Trotter et al 36c 

0.67 

0.83 

0.48 

0.50 

Kilpatrick et al 3d 

0.86 

0.95 

0.74 

0.77 

Dingle et al 37e 

0.85 

0.87 

0.47 

0.38 

Trowbridge et 
al 7d 

Not available 

0.96 

Not available 

0.58 

Combined 

0.86' 

0.92 

0.70' 

0.77 

(95% Cl) 

(0.82-0.90) 

(0.90-0.94) 

(0.68-0.72) 

(0.76-0.79) 


Abbreviation: Cl, confidence interval. 

“Analysis of all size categories used by authors. 

“Analysis of presence or absence of goiter only, according to authors' definitions. 
“Agreement between 2 observers, in 3 categories of staging, after an unspecified time. 
“Agreement between one observer and one or two others, in 4 categories of staging, 
after an unspecified time. 

“Agreement between one observer and 2 others, in 4 categories of staging, after 2 years. 
'Raw data combined from Kilpatrick et al 6 and Dingle et al 37 only because they are 
the only 2 with the same rating scales. 


Table 21-3 Comparison of Interobserver Precision for Thyroid 
Inspection and Palpation 


Agreement k 


Reference 

Inspection 

Palpation 

Inspection 

Palpation 

Kilpatrick et 

al 8 

0.95 

0.89 

0.77 

0.76 

Dingle et al 37 

0.87 

0.89 

0.38 

0.60 

Combined 
(95% Cl) 

0.93 

(0.90-0.96) 

0.89 

(0.85-0.92) 

0.65 

(0.62-0.69) 

0.74 

(0.67-0.82) 


Abbreviation: Cl, confidence interval. 


Table 21 -4 Intraobserver Precision in Assessment of Thyroid Size or 
Presence of Goiter 


Agreement k 


Reference 

All 

Categories 3 

Goiter Only" 

All 

Categories 3 

Goiter Only 3 

Hennessy 63 

0.83 

0.90 

0.70 

0.79 

MacLennan 
et al 22d 

0.79 

0.82 

0.41 

0.47 

Combined 
(95% Cl) 

0.81 

(0.77-0.84) 

0.85 

(0.82-0.88) 

0.59 

(0.52-0.65) 

0.65 

(0.63-0.67) 


Abbreviation: Cl, confidence interval. 

“Analysis of all size categories used by authors. 

“Presence or absence of goiter only, according to authors’ definitions. 

“Four categories of staging, reexamined within 44 days. 

“Three categories of staging, examined 12 days later. 

Because the entire thyroid is so accessible, the tendency is 
to interpret this accessibility as being due to an enlarged 
gland rather than the true reason, a decrease in interfering 
tissues that normally block access to the gland. A second 


cause is a variant of the normal placement of the thyroid 
gland in the neck. In some individuals, the gland is higher 
than usual, and this prominence is again attributed to 
enlargement. 31 A third anatomic variant has been termed 
Modigliani syndrome. 32 In Modigliani syndrome, the thy¬ 
roid actually lies in a normal position below the cricoid 
cartilage, but such individuals possess long, curving necks 
that enhance the prominence and palpability of the gland. 
A fourth condition producing pseudogoiter is a fat pad in 
the anterior and lateral portion of the neck. 24 Although this 
condition may be more common in obese individuals, it 
can also be found in those of normal weight, particularly 
young women. With experience, examiners can learn to 
differentiate this from true thyroid tissue by the differing 
textures and shapes and the lack of movement of a fat pad 
with swallowing. Another cause involves the thyroid being 
pushed forward by lesions behind it, making it more easily 
palpable. 5,33 Finally, any enlargement in the vicinity of the 
thyroid gland may be mistaken for an enlarged thyroid 
gland, particularly if it is adherent to the thyroid or larynx 
and so moves with swallowing. 29 

There are 3 principal causes of false-negative goiter 
detection in addition to true misclassification. The first and 
probably most common cause, of course, is an inadequate 
physical examination. In some circumstances, an imperfect 
examination is unavoidable, as when a patient is intubated. 
In most cases, however, with a little effort, a good examina¬ 
tion can be performed on virtually all patients. Second, 
some individuals, particularly the obese, the elderly, or 
those with chronic pulmonary disease, have short and thick 
necks, obscuring the thyroid. 5,24,34 Some patients also have 
an atypical thyroid placement, such as a retrosternal loca¬ 
tion, or lobes that are lateral and obscured by the sterno- 
cleidomastoids, making palpation difficult. 35 

PRECISION OF ESTIMATING THYROID SIZE 

Interobserver Variability 

Data on interobserver precision in estimating thyroid size 
are available both for rating scales that attempted to place 
glands in one of 3 or 4 categories according to palpability 
and visibility and for simple estimation of the presence or 
absence of a goiter (Table 21-2). Agreements were good to 
very good in both cases. When glands were placed in cate¬ 
gories, K ranged from 0.47 to 0.74, with a value from com¬ 
bined data of 0.70 (95% confidence interval [Cl], 0.68- 
0.72). 7,8,36,37 (The K statistic and other statistical measures 
are defined in the introductory article to this series. 38 ) For 
determination of goiter, K ranged in these 4 studies from 
0.38 to 0.77, with a value for combined data of 0.77 (95% 
Cl, 0.76-0.79). Similar results were reported in another 
study, 39 in which observers determined whether individual 
lobes were enlarged, with K from 0.32 to 0.62, and in yet 
another report 40 that determined the presence of a goiter, 
with K from 0.10 to 0.54. Because of the nature of the rat¬ 
ing scales used in 2 of these studies, 8,37 we can specifically 
compare interobserver variability for the techniques of 
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inspection (k = 0.65; 95% Cl, 0.62-0.69) and palpation (k = 
0.74; 95% Cl, 0.67-0.82). These techniques did not differ 
significantly in the level of agreement, and both were very 
good (Table 21-3). 

As might be expected, most disagreements between 
observers involved smaller glands and those near the cutoff 
for goiter determination, and most disagreed by only 1 stage 
in classifications. 7,8,36,37 Agreement may be better between 
examiners with greater experience than between those with 
differing levels of training. 40 

Intraobserver Variability 

In 2 studies, 6,22 examiners placed thyroid size in categories of 
enlargement and repeated the examination on a separate occa¬ 
sion (Table 21-4). These data produced a K from combined 
numbers of 0.59 (95% Cl, 0.52-0.65) for placement in all cate¬ 
gories of the rating scales used by the examiners. For simply 
determining the presence or absence of goiter, K ranged from 
0.47 to 0.79, with a K from combined data of 0.65 (95% Cl, 
0.63-0.67), which is very good. Similar results were reported 
in a study of patients with various thyroid diseases, in which 
K ranged from 0.54 to 0.74. 39 Intraobserver agreement was 
slightly better for the inspection component of the examination 
(k = 0.73; 95% Cl, 0.71-0.76) than for palpation (k = 0.65; 95% 
Cl, 0.63-0.67) (Table 21-5). 

ACCURACY OF ESTIMATING THYROID SIZE 

Three criterion standards have been used in assessing the accu¬ 
racy of thyroid size determination: weight measured after surgi¬ 
cal or postmortem removal, ultrasonographic assessment, and 
nuclear scintigraphy. Ultrasonographic assessments of thyroid 
weight correlate well with true gland weight as determined after 
excision (r = 0.88-1.0), although there is lack of agreement as to 
the best formula to use for estimating size. 18,21,41 Nuclear scan 
determination is a little less reliable but acceptable (r = 0.77- 
0.98). 9,42,43 Again, different formulas have been used to translate 
the scintigraphic profile to thyroid volume. 9,42,43 

Combining data from 9 studies of detection of goiter by 
physical examination, 12,17,18,21,44 ' 48 the sensitivity from com¬ 
bined data was 0.70 (95% Cl, 0.68-0.73) with a specificity of 
0.82 (95% Cl, 0.79-0.85) (Table 21-6). If a goiter was clini¬ 
cally detected, the positive likelihood ratio (LR+) of one 
being present was 3.8 (95% Cl, 3.3-4.5). Conversely, if a goi¬ 
ter was not thought to be clinically present, the negative like¬ 
lihood ratio was 0.37 (95% Cl, 0.33-0.40). These likelihoods 
are comparable with or better than those for many other 
physical signs 49,50 and were not affected by the presence of sin¬ 
gle or multiple nodules. 48 Experienced examiners were some¬ 
what more accurate in their assessments than more junior 
colleagues. 48 

Some authors have defined specific stages of thyroid 
enlargement according to the usual sequence of changes that 
occur as the thyroid gland increases in size. Because some of 
these staging classifications incorporate observations not 
normally used in simply estimating thyroid mass, they can 
significantly enhance the predictive abilities of the clinician 


Table 21-5 Comparison of Intraobserver Precision for Inspection 
and Palpation 


Agreement k 


Reference 

Inspection 

Palpation 

Inspection 

Palpation 

Hennessy 6 

0.93 

0.90 

0.82 

0.79 

MacLennan 
et al 22 

0.95 

0.82 

0.18 

0.47 

Combined 
(95% Cl) 

0.94 

(0.92-0.96) 

0.85 

(0.82-0.88) 

0.73 

(0.71-0.76) 

0.65 

(0.63-0.67) 

Abbreviation: Cl, confidence interval. 


Table 21 -6 Accuracy of the Clinical Assessment for the Presence 
of a Goiter 3 

Reference 

Sensitivity 

Specificity 

LR+ 

LR- 

Silink and 
Reisenauer 170 

0.64 

0.89 

5.8 

0.40 

Tannahill et al 21c 

0.93 

0.75 

3.7 

0.09 

Hegedus et al 44d 

0.43 

1.00 

Infinity 

0.57 

Hegedus et al 454 

0.60 

1.00 

Infinity 

0.40 

Hegedus et al 464 

0.77 

0.80 

3.9 

0.29 

Berghout et al 18e 

1.00 

0.62 

2.6 

0.00 

Perrild et al 471 

0.64 

1.00 

Infinity 

0.36 

Hintze et al 12s 

0.66 

0.74 

2.5 

0.46 

Jarlov et al 48c 

0.80 

0.80 

4.0 

0.25 

Combined 
(95% Cl) 

0.70 

(0.68-0.73) 

0.82 

(0.79-0.85) 

3.8 

(3.3-4.5) 

0.37 

(0.33-0.40) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likeli¬ 
hood ratio. 

“Goiter defined as thyroid gland size greater than 20 g, except in the study by Silink and 
Reisenauer, 17 in which goiter was defined as gland size greater than 22 g, and in the 
study by Hintze et al, 12 in which male gland size was greater than 25 g and female gland 
size was greater than 18 g. 

"Graded degree of lateral prominence, goiter being any prominence, with criterion stan¬ 
dard of autopsy weight. 

'Directly estimated weight, with criterion standard of ultrasonography. 

"Goiter defined as visible or palpable gland, with criterion standard of ultrasonography. 
“Graded 5 stages of thyroid size according to palpability and visibility, with criterion stan¬ 
dard of ultrasonography. 

'Two observers had to agree on the presence of goiter, which was undefined, using ultra¬ 
sonography as the criterion standard. 

"Graded 5 stages of thyroid size according to palpability, with criterion standard of ultra¬ 
sonography. 


(Table 21-7). In the combined data from 4 studies, 19 ' 21,48 when 
a clinician thought that a thyroid gland was of normal size, 
the LR+ of goiter being present was 0.15 (95% Cl, 0.10-0.21). 
If classified as 1 to 2 times normal size, the LR+ was 1.9 (95% 
Cl, 1.1-3.0), and for greater than 2 times normal, the LR+ 
was 25 (95% Cl, 3.6-175). 

Certain staging methods for thyroid enlargement can help 
clarify the true status of some of the patients with glands 
thought to be 1 to 2 times normal size after routine inspec¬ 
tion and palpation. 14,17 The amount of prominence of the thy¬ 
roid on lateral inspection, for example, resulted in a high 
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likelihood of goiter if it was greater than 2 mm (Table 21-8). 
Of further utility was finding that a gland was not visible 
with the neck extended, a result that effectively ruled out a 
goiter. 


Table 21 -7 Accuracy in Assessing Grades of Thyroid Gland Weight 

Reference 

LR+ 

Normal Thyroid Size, 0-20 g 

Williams et al 19a 

0.00 

Smith and Wilson 203 

0.00 

Tannahill et al 21b 

0.10 

Jarlov et al 48b 

0.26 

Combined (95% Cl) 

0.15(0.10-0.21) 

Thyroid Size 1 -2 Times Normal, 20-40 g 

Williams et al 19a 

Infinity 

Smith and Wilson 203 

0.32 

Tannahill et al 21b 

2.2 

Jarlov et al 48b 

2.6 

Combined (95% Cl) 

1.9 (1.1-3.0) 

Thyroid Size > 2 Times Normal, >40 g 

Williams et al 19a 

Infinity 

Smith and Wilson 203 

Infinity 

Tannahill et al 21b 

Infinity 

Jarlov et al 48b 

13 

Combined (95% Cl) 

25 (3.6-175) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 

“Directly estimated thyroid weight, with postsurgical weight as the criterion standard. 
“Directly estimated thyroid weight, with ultrasonography as the criterion standard. 


Table 21 -8 Accuracy in Assessing Thyroid Size by Categories 


Stage, Size 

LR+ (95% Cl) 


Method of Silink and Reisenauer 173 

0, not visible 

0.41 (0.34-0.49) 


1,0-2 mm 

3.4 (1.8-6.3) 


2,2-10 mm 

Infinity 


3, >10 mm 

Infinity 


Method of Berghout et al 14b 

0A 

0.00 


0B 

0.00 


1 

1.00 (0.42-2.4) 


2 

3.9(1.8-8.2) 


3 

Infinity 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 

“Graded degree of lateral prominence, with goiter being any prominence, using 
autopsy weight as a criterion standard. 

“Graded stages 0-3 according to palpability and visibility, with goiter being 1-3, 
using ultrasonography as a criterion standard: 0A indicates lobes smaller than the 
size of the thumb terminal phalanx, thyroid not visible with neck extended; OB, lobes 
bigger than the size of the thumb terminal phalanx, thyroid not visible with neck 
extended; 1, easily palpable, visible with neck extended; 2, visible with neck in nor¬ 
mal position; and 3, easily visible. 



True Thyroid Weight, g 

Figure 21-3 Error in Estimating Thyroid Mass 

Error in estimating thyroid mass can be described by the following formula; 
percentage of error = (-0.656 x mass) + 34.8, where thyroid mass is 
expressed in grams (r= 0.41; P< .001). The 95% confidence interval is 
indicated by the broken lines. 


BIAS IN ESTIMATING THYROID SIZE 

When the results from 4 studies 19 ' 21,48 estimating thyroid 
gland weights were combined, a regression line was pro¬ 
duced describing the bias in gland size determination 
(Figure 21-3). This clearly shows that sizes of smaller 
glands are routinely overestimated, whereas those of larger 
glands are underestimated. The size at which this crossover 
occurs corresponds to about 2 times normal size. The prac¬ 
tical application of this finding is that glands in the 1- to 2- 
times-normal-size category fall in the range in which size is 
typically overestimated. 

THE BOTTOM LINE 

To determine whether a goiter is present, follow these steps: 

1. Examine the thyroid gland by inspection and palpation. 

2. Categorize thyroid size as normal or goiter. Subcategorize 
goiter as small goiter (1-2 times normal) or large goiter 
(greater than 2 times normal). 

3. If you placed the thyroid in the small-goiter category, con¬ 
sider whether you overestimated the size; determine 
whether there is any prominence in the profile of the neck 
in the region of the thyroid when viewed laterally (classify 
the prominence as >2 or >2 mm), and determine whether 
the gland is not visible from the front with the neck 
extended. 

4. Place your patient in one of the following categories: “goi¬ 
ter ruled out,” normal thyroid size or thyroid considered to 
be not visible with neck extended; “goiter ruled in,” large 
goiter present or lateral prominence greater than 2 mm; or 
“inconclusive,” all other findings. 
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Prepared by David L. Simel, MD, MHS 
Reviewed by Adi Cohen, MD 


CLINICAL SCENARIO 


A 34-year-old woman had a child about 14 months ago. 
She had been breast-feeding her newborn but stopped 
about 2 months before her routine visit with you. She 
complains that her weight has not gone back to baseline 
and that her skirts are tight at the waist and her blouses are 
tight at the neck. Does she simply need to lose weight, or 
could she have a goiter? 

UPDATED SUMMARY ON GOITERS 

Original Review 

Siminoski K. Does this patient have a goiter? JAMA. 1995; 
273(10):813-817. 

UPDATED LITERATURE SEARCH 

Our updated literature search used the parent search strategy for 
The Rational Clinical Examination series, combined with the 
subject headings “exp Goiter,” “limited to diagnosis,” “radionu¬ 
clide imaging,” “epidemiology,” and “ultrasound studies,” pub¬ 
lished in English from 1994 to 2004. We also crossed the clinical 
subject headings with “meta-analysis,” “ROC curve,” and the 
textword “systematic review” in both MEDLINE and the 
Cochrane databases. The results yielded 135 titles, for which we 
reviewed the titles and abstracts; 10 were selected for additional 
review. Two additional articles were selected for review from the 
references. We included articles that allowed us to calculate the 
sensitivity and specificity, or the observer variability, of the clini¬ 
cal examination for a goiter. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The actual techniques for palpating the thyroid are 
described well in the original publication. However, 3 
methods for assessing thyroid size from palpation were 
presented: estimates of thyroid volume (in grams), lateral 
prominence of the thyroid (in millimeters), and a 5-level 
ordinal assessment based on palpability and visibility. The 


World Health Organization (WHO) proposed a simplified 
classification for the presence of a goiter: an individual has 
a goiter when each lateral lobe has a volume greater than 
the individual’s terminal phalanx of the thumb. 1 A grade 1 
goiter will be palpably enlarged but not visible when the 
neck is in a normal position. A grade 2 goiter will be palpa¬ 
bly enlarged and visible with the neck in the normal posi¬ 
tion. Most of the work on establishing these criteria comes 
from epidemiologic studies of endemic iodine deficiency 
that used children as the study subjects. The epidemiologic 
studies use examiners with considerable thyroid examina¬ 
tion experience. 

The effect of changing the threshold for the clinical 
screening test (palpation) changes the performance of the 
test. The interobserver variability when performed by 
experienced examiners is acceptable with both the 1960 
and 1994 criteria. It is important to compare the size of the 
thyroid to the thumb because the case definition is not 
“any” palpable thyroid but one that is larger than the distal 
thumb. One study in a high-prevalence area found that 
defining a palpable thyroid as enlarged, without comparing 
the size to the thumb, increased the clinical goiter rate by 
20%. 2 


CHANGES IN THE REFERENCE STANDARD 

The reference standard for thyroid enlargement remains 
ultrasonography. A goiter is defined as a thyroid gland of 
increased volume. However, the appropriate threshold for 
identifying the patient as having enlargement vs not hav¬ 
ing enlargement is evolving. WHO recognizes endemic 
iodine deficiency as a global health problem. The preva¬ 
lence rate of goiters in school-age children defines regions 
as having endemic iodine deficiency vs normal iodine sta¬ 
tus. The definitions of normality for children may be dif¬ 
ferent from those of adults because thyroid volume 
depends on body surface area (it also varies by sex). Areas 
of endemic iodine deficiency may be severely affected by 
malnourishment, and this in turn affects the size of thy¬ 
roid glands. 

The older 1960 WHO standard for thyromegaly required 
that a lobe of the thyroid have greater volume than the termi¬ 
nal phalanx of the child’s thumb. These criteria, established 
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before ultrasonography, defined endemic iodine deficiency as 
a population with a greater than 10% prevalence of goiter. 
Palpation was the only method for assessing thyroid volume, 
and the use of the child’s thumb as a comparative standard 
would seemingly account for both the child’s sex and body 
surface area. In 1994, a newer threshold was proposed for 
epidemiologic research that used ultrasonography and nor¬ 
mative thyroid volume adjusted for body surface area. The 
newer threshold decreased the prevalence level to more than 
5% to define iodine deficiency areas but also simplified the 
clinical criteria for a goiter. A key question is whether a uni¬ 
versal normative standard should be used for thyroid volume 
(eg, a universal threshold volume above which defines a goi¬ 
ter, or above a percentile for the universe of patients) or 
whether local reference standards should be established (eg, 
thresholds developed within a defined geographic region). 3 

RESULTS OF LITERATURE REVIEW 

When palpating the thyroid, compare the results of each lobe 
to the subject’s distal thumb. A thyroid with both lobes larger 
than the patient’s distal thumb is considered palpably 
enlarged (see Table 21-9). 


Table 21-9 Likelihood Ratios tor a Palpable Thyroid 

Gland 

Indicating a Goiter 



Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Palpable thyroid, children (1994 criteria) 4 

3.0 (2.5-3.5) 

0.30 (0.24-0.37) 

Palpable thyroid, pregnancy (1994 criteria) 5 

4.7 (3.6-6.0) 

0.08 (0.02-0.27) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 


EVIDENCE FROM GUIDELINES 

The US Preventive Health Services Task Force evaluated 
the role of thyroid-stimulating hormone (TSH) screening 


in healthy adults and observed that men have a lower 
prevalence of unrecognized and unsuspected thyroid dis¬ 
ease compared with women. 6 High risk patients for thy¬ 
roid disorder include the elderly, postpartum women, 
those with high levels of radiation exposure (>20 mGy), 
and patients with Down syndrome. However, the task 
force concluded that the data are inconclusive for recom¬ 
mending TSH screening. The task force does not address 
clinical screening with palpation. Health Canada guide¬ 
lines came to similar conclusions as the US Health Ser¬ 
vices Task Force. 7 


CLINICAL SCENARIO—RESOLUTION 


Returning to prenatal weight is a postpartum problem for 
many women. This patient has an unusual complaint of 
clothing feeling tight around the neck, so you feel obli¬ 
gated to palpate for a thyroid. Despite the enlargement, 
many patients do not recognize that they have a goiter. 
Goiters are more common in women, especially during 
pregnancy and in lactating mothers. When you palpate 
her thyroid, you need to use proper technique and make 
sure that any palpable thyroid tissue moves upward when 
she swallows. If you feel thyroid tissue, decide whether the 
volume of the palpable tissue in both lobes is greater than 
the volume of her distal thumb. Although this approach of 
assessing volume has been validated in children and not in 
adults, inexperienced examiners may have difficulty 
deciding whether the thyroid volume is normal compared 
to endocrinologists who assess the size compared with a 
normal gland (eg, 1.5 times normal or 2 times normal). If 
you are uncertain whether the gland is normal, ultra¬ 
sonography would confirm the presence or absence of a 
goiter. You should also assess more fully for signs and 
symptoms of thyroid dysfunction. Although the most 
common cause for inability to lose weight postpartum 
may be lack of exercise, a sensitive TSH assay would be 
required to make sure she does not have hypothyroidism. 
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GOITER—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The prior probability of a goiter is affected by many variables, 
including the patient’s body surface area, sex, and regional 
variations associated with the endemic iodine deficiency. Two 
recent European studies of thyroid volume among commu¬ 
nity samples of healthy adults give us insight into the preva¬ 
lence of goiter in the non-iodine-deficient area: 4% of 
patients in Spain (95% confidence interval [Cl], 3%-6%) 8 
and 10% of patients in France (95% Cl, 9%-ll%) 9 had pal¬ 
pable goiters. Unfortunately, the thyroid volume was not con¬ 
firmed for patients with palpable goiters. Nonetheless, we can 
make some inferences that give us good starting points. The 
WHO defines an iodine-deficient area by the prevalence of 
goiter in school-aged children. According to normative popu¬ 
lation values, children who live in a non-iodine-deficient area 
should have a goiter prevalence of less than 5%.' Adults might 
have palpable thyroid glands for reasons other than iodine 
deficiency, so prevalence values slightly higher make sense. A 
starting point of 5% to 10% for healthy adults makes sense 
for the prior probability of a palpable thyroid. 

POPULATION FOR WHOM A GOITER DISEASE 
SHOULD BE CONSIDERED 

• Symptoms of hyperthyroidism or hypothyroidism 

• Children, especially those in endemic iodine deficiency 
locales 

• Pregnant and lactating women 

• Elderly patients 


DETECTING THE LIKELIHOOD OF A GOITER 

Because examining children is different from examining pregnant 
women for thyroid disease, we cannot combine the data (see 
Table 21-10). The techniques for examination, however, are simi¬ 
lar. We have no data for the results of thyroid palpation in non¬ 
pregnant adults because epidemiologic studies of normal adults’ 
thyroid volume exclude those with palpable enlargement. 


Table 21 -10 Likelihood Ratios for a Palpable Thyroid Gland 

Indicating a Goiter 

Palpable Thyroid With Both Lobes 
> the Volume of the Subject's Distal 
Thumb (1994 criteria) vs Not Palpable 

LR+ (95% Cl) 

LR- (95% Cl) 

Children 

3.0 (2.5-3.5) 

0.30 (0.24-0.37) 

Pregnancy 

4.7 (3.6-6.0) 

0.08 (0.02-0.27) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

Palpating thyroid tissue in both lobes of a volume greater than 
the volume of the patient’s distal thumb phalanx increases the 
likelihood of a goiter, but there will be false-positive results. 

REFERENCE STANDARD TESTS 

Ultrasonography. 

In epidemiologic research, urinary iodine studies are evalu¬ 
ated along with thyroid palpation. 


• Patients with excessive radiation exposure 

• Patients with Down syndrome 
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EVIDENCE TO SUPPORT THE UPDATE 


Goiter 



TITLE Endemic Goiter in Pregnant Women: Utility of 
the Simplified Classification of Thyroid Size by Palpation 
and Urinary Iodine as Screening Tests. 

AUTHORS Castaneda R, Lechuga D, Ramos RI, Magos 
C, Orozco M, Martinez H. 

CITATION BJOG. 2002;109(12):1366-1372. 

QUESTION Do the simplified World Heath Organiza¬ 
tion (WHO) criteria for goiter work well for pregnant 
women? 

DESIGN Prospective, cross-sectional survey of patients 
who underwent independent clinical examinations and 
ultrasonography. 

SETTING Three communities in Mexico. One region 
had endemic iodine deficiency, one had a low prevalence 
of goiter, and one was an urban area not expected to have 
a high prevalence of iodine deficiency. 

PATIENTS Pregnant women who showed up for deliv¬ 
ery in each of the 3 referral hospitals for the region. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Thyroid size by the WHO 2001 criteria: grade 0, normal; 
grade 1, both lobes larger than the distal phalanx of the 
thumb, but the gland is not visible; grade 2, both lobes palpa¬ 
bly enlarged but also visible. The examination techniques are 
well described and the examiners had their reliability con¬ 
firmed. 

Thyroid size was confirmed by ultrasonography. Patients 
also provided urine samples for urinary iodine. 

MAIN OUTCOME MEASURE 

Thyroid size. Values below the 90th percentile for the regions 
were considered not enlarged. 


Table 21 -11 Likelihood Ratio of a Palpable Thyroid for Thyromegaly 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Palpable thyroid 0.94 0.80 4.7 (3.6-6.0) 0.08(0.02-0.27) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

MAIN RESULTS 

The 2 endocrinologist examiners had a K of 0.70 for their 
agreement on the scoring scheme. The criteria have good dis¬ 
criminative properties for identifying patients with goiter 

(see Table 21- ). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Study population included sample of low-risk 
and higher-risk patients. The examiners had their reliability 
assessed. The examination techniques are well described. 

LIMITATIONS Examination done among pregnant patients 
only. The reference standard was a threshold that some might 
consider too low; a value greater than the 90th percentile was 
considered as goitrogenous. 

Palpating pregnant women’s thyroid glands may be easier 
than palpating the thyroid gland of nongravid subjects. How¬ 
ever, it is possible that the simpler grading scheme and training 
of the examiners led to excellent reliability. The low negative 
likelihood ratio is impressive, suggesting that the finding of a 
nonpalpable gland during pregnancy rules out thyromegaly. 
Of course, patients can have substernal goiters, so we know 
that there will be some false-negative results (but not many). 

Because the lower threshold for defining a goiter was used 
(the 1960 WHO criteria of 10th percentile rather than the 
currently recommended 5th percentile), we would expect the 
specificity and the positive likelihood ratio to be worse. How¬ 
ever, compared with the results for using a 5-level scheme, 
the results are promising. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Classification of Thyroid Size by Palpation and 
Ultrasonography in Field Surveys. 

AUTHORS Peterson S, Snaga A, Eklof H, et al. 

CITATION Lancet. 2000;355(9198): 106-110. 

QUESTION What are the effects on observer variability 
of the World Health Organization 1994 palpation system 
vs the 1960 system? 

DESIGN Cross-sectional sample, convenience sample. 
Independence of examiners (3) and radiologist not specified. 

SETTING Area of high prevalence of goiter from 
endemic iodine deficiency in Tanzania. 

PATIENTS Schoolchildren. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Three examiners examined each child. One of the examiners 
was described as “experienced,” 1 was an experienced para¬ 
medic, and 1 was an inexperienced “expatriate physician.” 
Each examiner evaluated each child in the morning and then 
again in the afternoon. Although they were not given their 
morning results, the examiners were not blinded to the child. 
The ultrasonographic testing for each child was also 
repeated. 

MAIN OUTCOME MEASURES 

Interobserver and intraobserver variability. 

MAIN RESULTS 

Seventy-five percent of the children had goiter by the refer¬ 
ence standard ultrasonogram. 

The inexperienced physician had a low intraobserver vari¬ 
ability with both the 1960 and 1994 criteria (k of 0.36 and 
0.44, respectively). The intraobserver variability for the expe¬ 
rienced examiners was similar for both criteria (k of 0.57- 
0.58 for the 1960 criteria and 0.53-0.60 for the 1994 criteria). 
The performance of the inexperienced observer improved 
over time (k of 0.26 during the first 3 days of the study com¬ 
pared with 0.56 for the last 3 days, using the 1960 criteria). 

The ultrasonographer had an intraobserver variability that 
resulted in a reclassification of 14% of patients from morning 
to afternoon examinations (k of 0.63). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Three examiners of various levels of experi¬ 
ence. Intraobserver variability was assessed. 


LIMITATIONS The ultrasonographer had poor precision, 
making the quality of the reference standard doubtful. 1 There 
was a lack of independence in the examination by each clini¬ 
cian. The study was done in an exceedingly-high-prevalence 
area. 

We include a review of this article for several reasons. First, 
it seems clear that the intraobserver variability is better for 
experienced examiners. Second, this study demonstrated that 
the inexperienced observer’s precision improved during the 
course of the study, which allows us to infer that practice is 
helpful. Third, it is important to apply the clinical criteria as 
they are currently specified. “Any” palpable enlargement does 
not qualify the patient has having a goiter, because each lobe 
must be of greater volume than the distal phalanx of the 
thumb. Finally, the reliability of the ultrasonographic refer¬ 
ence standard was probably too low. For this reason, we do 
not show the sensitivity and specificity of the individual 
examiners. 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Zimmermann M. Assessing goiter prevalence. Lancet. 2000;355(9219): 

1995-1996; author reply 1996-1997. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Two examiners evaluated each child, blinded to each other’s 
result. An ultrasonographer evaluated each subject, blinded 
to the clinicians’ examinations. The examiners and ultra- 


TITLE Thyroid Ultrasound Compared With World 
Health Organization 1960 and 1994 Palpation Criteria for 
Determination of Goiter Prevalence in Regions of Mild 
and Severe Iodine Deficiency. 

AUTHORS Zimmermann M, Saad A, Hess S, Torresani 
T, Chaouki N. 

CITATION Eur J Endocrinol. 2000; 143(6):727-731. 

QUESTION How do the 1960 World Health Organiza¬ 
tion (WHO) criteria for goiter compare with the simpli¬ 
fied 1994 criteria? 1 

DESIGN Prospective, independent, cross-sectional 
sample. 

SETTING Two mountainous regions of Morocco. One 
was an area of WHO-defined mild endemic iodine defi¬ 
ciency disease; the other had goiter prevalence compatible 
with severe iodine deficiency. 

PATIENTS Schoolchildren, 200 from each village (n = 
400 total). 
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Table 21-12 WHO 1994 Criteria for Goiter 

Grade 0 No palpable or visible goiter. 

Grade 1 Palpable but not visible neck mass consistent with the thyroid 
when the neck is in the normal position. The gland moves up 
when the patient swallows. 

Grade 2 A swelling in the neck that is visible when the neck is in the 
normal position and is consistent with thyroid when palpated. 


sonographer were all experienced in goiter epidemiologic 
studies. The clinicians recorded their findings according to 
the WHO 1960 criteria for goiter and the 1994 criteria ( 
1-12). The WHO upper limit of the thyroid volume, 
adjusted by the subject’s sex and body surface, was used as 
the reference standard. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and K values. 

MAIN RESULTS 

In the community with mild iodine deficiency, K was 0.47 
between examiners for the 1960 vs 1994 criteria and 0.53 for 
the 1994 criteria. In the severe iodine deficiency site, K was 
0.67 between examiners for both the 1960 and 1994 criteria. 

In the high-prevalence village, the accuracy of the clinical 
examination was similar for the 1960 and 1994 criteria (see 
). In the low-prevalence village, the clinicians 
estimated a prevalence of 20% to 21% with the 1960 criteria 
and 25% to 26% with the 1994 criteria; however, the actual 
prevalence was 12%. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large sample size of a community-based pop¬ 
ulation for whom it was reasonable to screen for thyroid dis¬ 
ease. The study subjects were not enrolled because of a 
suspicion for disease. 

LIMITATIONS The sampling frame is not specified and the 
study enrolled no adults. All the clinicians were experienced 
examiners, which limits generalizability. 


Table 21 -13 Likelihood Ratio for Thyroid Palpation for 2 Different 
Examiners, Based on the 1994 vs 1960 WHO Criteria 


Test 

Examiner 

LR+ (95% Cl) 

LR- (95% Cl) 

WHO 1994 

A 

2.9 (2.3-3.6) 

0.32 (0.24-0.43) 


B 

3.1 (2.5-3.9) 

0.27 (0.19-0.38) 


Combined 

3.0 (2.5-3.5) 

0.30 (0.24-0.37) 

WHO 1960 

A 

3.3 (2.6-4.2) 

0.34 (0.26-0.45) 


B 

3.3 (2.6-4.2) 

0.31 (0.23-0.42) 


Combined 

3.3 (2.8-3.9) 

0.33 (0.27-0.40) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


The 1994 revised criteria for goiter simplified the scale from 5 
to 3 levels. Overall, the performance between the 2 criteria 
appears similar. With fewer choices, the 1994 criteria ought to be 
more reliable, especially when used by less experienced examin¬ 
ers. The newer criteria required only that the thyroid be palpable 
to be considered clinically enlarged. In areas of low prevalence, 
this would lead to overestimates of ultrasonographically proven 
thyromegaly. A 2001 revision of the criteria clarified that “palpa¬ 
ble” meant an enlargement of both lobes to a volume greater 
than the distal phalanx of the subject’s thumb. 

It is disappointing that the experienced thyroid examiners 
did not have higher diagnostic accuracy. Other studies have 
found that the clinical examination overestimates the volume 
of the thyroid in schoolchildren. 2 A partial explanation may 
relate to problems with using a worldwide WHO fixed cut 
point for ultrasonographic size rather than with using local 
references values adjusted for sex and age. 3 

Reviewed by David L. Simel, MD, MHS 
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CHAPTER 


CLINICAL SCENARIO 


Does This Patient Have 

Hepatomegaly? 

C. David Naylor, MD, DPhil, FRCPC 


The patient in your examining room is new to the prac¬ 
tice. He is 52 years old, emigrated from Southeast Asia 
about 10 years ago, and has no specific complaints except 
fatigue. On examination you find little of note except that 
his liver edge is firm, is easily felt, and extends about 6 cm 
below the costal margin across much of the right upper 
quadrant. The span, by light percussion, is 17 or 18 cm. 
Should you be concerned? What does the research litera¬ 
ture tell us about the meaning of these findings? 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


Ideally, the clinical meaning of physical examination findings 
should be established in research studies that account for the 
overall context, including other signs and details from the medi¬ 
cal history. This approach is difficult in liver disease because the 
physical manifestations of hepatic dysfunction are protean, and 
many multisystem diseases affect the liver. Our focus, therefore, 
is on physical examination of the liver itself. This means, how¬ 
ever, that we implicitly depend on the clinician’s ability to make 
a baseline estimate of the likelihood of liver disease according to 
the medical history or other physical findings. 

Although many maneuvers recommended in liver examina¬ 
tion are unproven, there is reasonable evidence that the presence 
or absence of hepatomegaly can be determined with moderate 
accuracy on physical examination. Descriptive studies suggest 
that other qualitative findings may help in clinical assessment of 
patients with possible liver disease. Liver examination, like most 
physical diagnosis maneuvers, is not dissimilar to a screening 
test; it may support or refute hypotheses generated by the medi¬ 
cal history and generate further hypotheses itself, allowing more 
selective use of imaging techniques and laboratory tests as tools 
to confirm the suspected diagnoses. 1 


TOPOGRAPHY 

Situated intraperitoneally in the right upper quadrant, the 
liver seldom extends more than 5 to 6 cm across the midline 
into the left upper quadrant. The upper surface is convex and 
nestles under the diaphragm, typically at the level of the fifth 
or sixth anterior rib in quiet respiration. The lower surface 
tends to be concave, with the gallbladder in it. Although the 
fundus of the gallbladder may project below and anteriorly to 
the lower liver edge, it is not felt in healthy persons. 

The bulk of the liver sits posteriorly, where it cannot be 
assessed from behind because of intervening retroperitoneal 
contents, ribs, and lumbar musculature. Anteriorly, the liver 
sits partly above the costal margin, with ribs and lung super¬ 
vening, and partly below it. The portion extending below or 
inferior to the costal margin varies and typically runs parallel 
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to the costal margin. However, physicians working in mod¬ 
ern imaging departments, like generations of surgeons and 
anatomists before them, can attest to the degree of variability 
in the shape of the organ, including the extent to which the 
lower edge parallels the costal margin and the degree of 
extension beyond the midline into the left upper quadrant 
(Figure 22-1). To some extent, the vertical liver span (ie, the 
linear distance from the top of the liver dome down to the 
lower edge) is a function of where in the right upper quad¬ 
rant the liver edge is palpated or percussed (Figures 22-1, 22-2, 
and 22-3). The falciform ligament joins the midanterior sur¬ 
face of the liver to the diaphragm and anterior abdominal 
wall. With respiration, diaphragmatic contraction drives the 
liver downward, and the anterior surface of the organ rotates 
slightly to the right. In quiet inspiration and expiration, the 
excursion is approximately 2 to 3 cm. 

A SUGGESTED APPROACH TO LIVER EXAMINATION 

We assume that, as part of the general abdominal examina¬ 
tion, you have already inspected the abdomen, including the 
right upper quadrant, looking for obvious irregularities or 
deformities. Then, in adults without a history or physical 
findings suggestive of potential liver disease, palpate for the 
lower liver edge. Start with gentle pressure in the right lower 
quadrant; ask the patient to breathe in gently and slowly to 
bring the liver edge down to the examining fingertips. At 
each exhalation, move the fingers up about 2 cm. If the edge 
is not felt, no further examination is suggested. 

If the edge is felt, confirm that you are palpating roughly in 
the middle of the right portion of the abdomen, that is, corre¬ 
sponding to the midthoracic line or so-called midclavicular line 
(MCL). Mark the lower edge. Then, in the same approximate 
plane, percuss down from about the level of the third rib, with 
the pleximeter finger (the finger that you strike with the per¬ 


cussing finger) laid horizontally. Typical lung field resonance 
will be heard. Move one rib space at a time until the tone 
changes because of the interposition of the dome of the liver 
behind the air-filled lung. There will be a gradation with 
increasing dullness as you move caudally and the volume of the 
air-filled lung overlying the liver is diminished (Figure 22-3). 

To confirm increased dullness, spread 2 or 3 pleximeter fin¬ 
gers over adjacent rib spaces and percuss quickly a number of 
times from greater to lesser resonance. If doubts persist, have 
the patient take a deeper breath and hold it; then percuss to 
confirm an unequivocal increase in resonance at that rib space. 
Determination of a level for the upper edge of liver dullness is 
sometimes helped by placing the middle finger over the likely 
level for initial tone change and laying the second and ring fin¬ 
gers on adjacent rib spaces. Again, percuss back and forth. The 
percussion tone over the top finger should be resonant; the 
lower finger, unequivocally dull; and the middle finger, reso¬ 
nance between that of the other fingers. 

Try to ensure that the lower and upper borders are marked 
either in quiet respiration or, if deep breaths are taken, in the 
same phase of respiration. 

In instances during which you have other evidence to sug¬ 
gest liver disease, but the liver edge was not palpable, attempt 
to locate the lower edge by gentle percussion in the right 
lower quadrant, following the plane of the MCL and again 
working from resonance to dullness. Tricks similar to these 
(eg, multiple pleximeter fingers and manipulating level of 
dullness with changes in depth of respiration) may help con¬ 
firm the finding. If there is no definite tone change up to the 
costal margin—a not uncommon finding—end the attempt 
to define liver size. 

Determination of vertical liver span in the MCL can be 
done in 2 ways. We recommend gentle percussion for locat¬ 
ing the upper liver border and palpation or gentle percussion 
to locate the lower border. An alternative is to use firm per- 




Figure 22-1 Radioisotope Scans of the Liver 
Showing Variability in Organ Shape 

Note the costal margin markers as white broken lines 
and the other 2 dark point markers for research pur¬ 
poses; respiratory excursion blurs and expands the 
point markers, a limitation on the precision of any 
study done with reference to scintigraphic standards. 
A-D, Variation in alignment with the costal margin. E, 
F, Prominence of the left (caudate) and right (pyrami¬ 
dal) lobes, respectively. 
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cussion, deliberately ignoring whether or not the lower edge 
is palpable. 

Liver size correlates with body size, and liver shape correlates 
with habitus. Liver span is greater in men than women and in 
tall vs short persons. However, as a rough guide, an MCL span 
of less than 12 to 13 cm with gentle percussion alone or gentle 
percussion combined with palpation makes hepatomegaly 
unlikely. Ranges of normal have been established for firm per¬ 
cussion (Table 22-1 ) but will vary among clinicians, depending 
on percussion techniques. Enlargement suggested by percus¬ 
sive span alone is weaker evidence for hepatomegaly than span 
based on palpation of the lower liver edge. 

Apart perhaps from the situation of fulminant hepatic failure, 
observing reduction in liver span is of limited use because many 
other features of chronic Ever failure will be present in situations 
in which reduction in parenchymal mass has occurred. 

When the liver edge is palpable, tracing the edge and defin¬ 
ing its characteristics qualitatively are recommended primarily 
in persons who are strongly suspected of having liver disease. 
Auscultation is seldom helpful. Once you have a high index of 
suspicion about liver disease, biochemical tests and biopsy are 
the main events; the more esoteric findings on physical exami¬ 
nation become a sideshow for impressing referring physicians 
or trainees. 

EVIDENTIARY BASIS FOR THE APPROACH 

Inspection 

Visualization of infracostal extension of the Ever is occasionaUy 
possible when malnutrition or cachexia thin out the overlying 
tissues or when there is massive hepatomegaly. No studies, to our 
knowledge, describe the yield from inspection of the Ever outline 
in the abdomen, but clear-cut abnormalities should at least be 
specific, thereby ruling in hepatomegaly and underlying disease. 

Auscultation 

Friction rubs may occur with primary and metastatic malig¬ 
nancies, after liver biopsies, with infective and inflammatory 
conditions, and with or without concomitant hepatomegaly. 
Rubs, although always abnormal, are rare and nonspecific; 
even with careful examination of patients with liver tumors, 
no more than 10% of patients have a rub. 2 ' 4 

A detailed review of abdominal auscultation is provided by 
Sapira, 5 including bruits and hums occurring in and around 
the right upper quadrant. Considerable time can be spent on 
auscultation, but there is no evidence that these findings are 
helpful in routine examination. Features reputed to help sep¬ 
arate bruits of arterial and venous sources are described in 
Table 22-2. Venous hums occur in portal venous hyperten¬ 
sion of any cause. The hum, a low-pitched murmur with sys¬ 
tolic and diastolic components, arises from communication 
between the umbilical or paraumbilical veins and abdominal 
wall veins. The responses of venous hums to the Valsalva 
maneuver, splenic pressure, or ingestion of meals are incon¬ 
sistent. 6,7 Other causes of true continuous murmurs, such as 
arteriovenous fistula in the splanchnic circulation or hepatic 


Abdominal quadrants Abdominal regions 



Figure 22-2 The Surface Anatomy of the Abdomen Can Be Divided 
Into Quadrants or Regions 

The edge of the liver will typically be felt in the right upper quadrant. 



Figure 22-3 (A) Surface Landmarks for (B) Percussing the Liver 
Creates Resonance Dependent on the Underlying Structures 

A, Variation in liver span according to the vertical plane of examination. 
Because there is variability in where clinicians determine the midclavicu- 
lar line to be, the inevitable consequence is that liver span may also vary, 
even if multiple observers are perfectly accurate in measuring it. B, Per¬ 
cussive resonance varies with the thickness of interposed air-filled lung 
tissue. The percussion note changes with decreasing resonance caudally 
as less air-filled lung tissue is interposed between the liver and ribs. 
However, the site of a change from obvious resonance over the lung 
(point A) to less resonance (points B and C) may be difficult to judge. 
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Table 22-1 Normal Liver Span in the Midclavicular Line and the 
Midsternal Line, as Determined With Heavy Percussion Alone 37 


MCL, cm 

MSL, cm 

Height, cm 

Men 

Women 

Men Women 

150 

8.25 

6 

6 4 

157.5 

9 

6.75 

6.5 4 

165 

9.75 

7.5 

7 5 

172.5 

10.25 

8 

7.5 

180 

11 

8.75 

8 

187.5 

11.75 

9.5 

8.5 


Abbreviations: MCL, midclavicular line; MSL, midsternal line. 
“Ellipses indicate data not available. 


Table 22-2 Potential but Unproven Means to Differentiate Venous 
Hums and Arterial Bruits 


Feature 

Venous Hum 

Arterial Bruit 

Pitch 

Lower 

Higher 

Volume 

Soft 

May be loud 

Timing 

Continuous 

Systolic 

Systolic and diastolic 

Systolic accentuation 

Yes 

Yes 

Localized 

No 

Yes 

Change with position 

Yes 

Sometimes 4 

Change with inspiration 

Louder 

May decrease 

Stethoscope pressure 

Diminishes 

Unchanged 


“Although positional change in arterial bruits would not be expected in an arterial 
bruit caused by tumor vascularity, positional change may occur if a bruit is caused 
by pressure on the abdominal aorta from the enlarged left lobe of the liver. 4 ' 59 


hemangioma, are uncommon, and arterial bruits rarely have 
such lengthy diastolic spillover that they sound continuous. 4,8 

Arterial bruits over the liver or in the epigastrium have 
been described with most liver tumors, as well as alcoholic 
hepatitis. 2 ' 4,811 However, among patients with liver disease in 
general (eg, a convenience sample of cirrhosis, alcoholic hep¬ 
atitis, and malignancy), the prevalence of bruits has been 
reported at less than 3%. 8 Both clinically 9 and with phono- 
angiographic enhancement, 8 the murmurs associated with 
alcoholic hepatitis and malignancy cannot be distinguished 
from one another, although the former resolve if and when 
the condition improves. The prevalence of clinically audible 
bruits in patients with confirmed liver cancer varies from 
10% n to as much as 56%. 12 Kingston et al 13 reported a dia¬ 
stolic component to most bruits heard in their patients with 
hepatoma. About 1% to 2% of unselected patients on a gen¬ 
eral medical service will have abdominal bruits of some 
kind, 4 and the ability of clinicians to distinguish hepatic from 
other arterial bruits has never been assessed. 

Auscultation over the liver should be considered only when 
medical history and other physical findings are suggestive of 


hepatic disease; even then, the findings should be interpreted 
cautiously. 

A PALPABLE LIVER EDGE: WHAT DOES IT MEAN? 

Cirrhosis or infiltrative disorders increase the firmness of the 
liver edge and the likelihood of its being felt independent of 
effect on organ size. 14 Among gastroenterologists, agreement 
on the presence of a palpable liver edge is about 50% greater 
than expected by chance alone. 15 More interobserver dis¬ 
agreement would be expected in ordinary practice. 

There is a paucity of data on the prevalence of palpable liv¬ 
ers in the general population. One study 16 has reported data 
on palpability of the liver among 1000 military personnel 
(717 men and 283 women) undergoing routine examination; 
852 subjects were 40 years of age or younger. Palmer, 16 the 
author and sole examiner, excluded any persons in whom 
liver disease was suspected or who were difficult to examine. 
In 57% of subjects, the liver was either not palpable in the 
right upper quadrant or felt just at the costal margin. An 
additional 28% descended only 1 to 2 cm below the costal 
margin. Findings were similar for both sexes. The proportion 
of palpable livers was inflated by 2 factors. First, all subjects 
were examined in deep, held inspiration. Second, as Palmer 16 
himself cautioned, “There is no question but that many of 
the potentially palpable livers would have been overlooked if 
this had not been a specially directed study.” 

Ability to palpate the liver is not closely correlated with 
liver size in studies using reference standards such as scintig¬ 
raphy or ultrasonography. 17 ' 20 (Although many published 
studies use scintigraphy as a reference standard, it does have 
the drawback of motion artifact in conventional applica¬ 
tions.) Patients undergoing liver scintiscan are preselected, 
and a high proportion of palpable livers might be expected. 
However, studies from nuclear medicine departments show 
that although the majority of patients scanned have some 
infracostal extension of the liver, less than half of these 
patients had palpable livers. 14,18 ' 21 In one study, Rosenfield et 
al 22 chose 100 scintiscans at random and compared the find¬ 
ings with the clinical records. Among patients without defi¬ 
nite evidence for liver disease in the medical records, mean 
scintigraphic vertical span in the right MCL was similar 
among those with palpable (12.9 cm) and nonpalpable 
organs (12.5 cm), as were the proportions in each category 
(45% vs 55%). Overall, the chance that a patient with a pal¬ 
pable liver also had liver disease was 63% (36 of 57 patients; 
95% confidence interval [Cl], 49%-76%), but the chances of 
a palpable liver meeting scintigraphic criteria for enlarge¬ 
ment were only 46% (24/52 patients; 95% Cl, 32%-61%). 
Studies 14,22,23 on palpability and hepatomegaly are summa¬ 
rized in Table 22-3. This distinction between abnormal and 
enlarged livers is a recurrent problem because livers may be 
abnormal yet not enlarged. 

What of the converse proposition, that is, that a nonpal¬ 
pable liver is not enlarged? Because normal livers usually 
extend below the costal margin yet may not be palpable, 
this proposition rests on an assumption that enlarged livers 
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will be diseased, abnormally hard, and therefore much 
more easily felt. As summarized in Table 22-3, a nonpalpa- 
ble liver does reduce the probability of hepatomegaly, even 
though a palpable organ has less than a 50% chance of 
being enlarged. These figures are influenced by the pooled 
prevalence of hepatomegaly, 23% in these studies. As a 
prevalence-free characteristic, we can report that the 
pooled likelihood ratio 24 (LR) for hepatomegaly, given a 
palpable liver (positive likelihood ratio [LR+]), is 2.5. The 
LR in the absence of palpable hepatomegaly (negative LR) 
for the presence of an enlarged liver detected by scanning is 
0.45. However, there will likely be an evaluation bias in 
these figures as a result of preferential referral of patients 
with palpable livers for scintigraphy. This bias would argu¬ 
ably lead to a slight overestimate of sensitivity and still 
larger underestimate of specificity. If specificity were higher, 
the LR+ would be stronger. In any event, an LR approach is 
most useful if you know the previous odds of hepatomegaly 
for representative cohorts of patients with various diseases, 
a set of numbers that are currently unknown and should be 
the subject of research in the future. 

In sum, a palpable liver is not necessarily enlarged or dis¬ 
eased but does increase the likelihood of hepatomegaly. The 
vertical liver span and overall clinical context must also be 
considered. Conversely, a nonpalpable liver edge does not 
rule out hepatomegaly but does reduce its likelihood. This is 
particularly relevant in those settings of low prior probability 
of liver disease, in which further examination is likely to have 
little yield if the liver cannot be felt. 

WHAT ELSE CAN BE LEARNED FROM PALPATION? 

Da Costa 25 wrote 93 years ago, “Tactile sense decides the 
questions of hepatic tenderness, pulsation, friction, and 
thrills, and determines the consistence and the contour of its 
anterior and lower surfaces.” However, there are few data on 
the reliability and accuracy of these qualitative judgments 
about liver edge characteristics. 

A pulsatile liver edge is well documented in tricuspid valvu¬ 
lar disease. 26 ' 28 Although this sign may be present clinically in 
the majority of cases, 29 no modern studies adequately docu¬ 
ment the frequency of the association and its relationship to 
differing degrees of tricuspid valvular dysfunction. Unequiv¬ 
ocal pulsatile hepatomegaly is also reported in 35 of 55 con¬ 
secutive patients (64%; 95% Cl, 50%-76%) with confirmed 
constrictive pericarditis accumulated in 2 case series. 30,31 The 
low false-negative rates give this sign some potential value in a 
setting in which constrictive pericarditis is already suspected. 
Unfortunately, as Osier 32 observed more than a century ago, 
there is a need to distinguish between an expansile liver edge 
and transmitted aortic or right ventricular impulses that are 
commonly present. There are no data on examination maneu¬ 
vers to make such a distinction, although inspiratory increase 
in the magnitude of the pulsation has been reported anecdot¬ 
ally with tricuspid insufficiency. 26 Detection of differential tim¬ 
ing of hepatic pulsations has been described (eg, A vs V waves) 
but is rare and doubtless difficult to pinpoint. 27 


Table 22-3 Probability of Hepatomegaly if a Liver Is Palpable or Not 
and Related Likelihood Ratios 


Hepatomegaly 


Liver Palpability 

Yes 

No 

LR (95% Cl) a 

Peternel et al 14b 

Yes 

12 

12 

LR+, 1.7(1.0-2.8) 

No 

4 

15 

LR-, 0.45 (0.18-1.1) 

Rosenfield et al 22c 

Yes 

24 

28 

LR+, 1,4 (1.0-2.1) 

No 

13 

35 

LR-, 0.63 (0.39-1.0) 

Walk 230 

Yes 

195 

263 

LR+, 2.6 (2.3-3.0) 

No 

95 

768 

LR-, 0.44 (0.37-0.52) 

Pooled 

Yes 

231 

303 

LR+, 2.5 (2.2-2.8) 

No 

112 

818 

LR-, 0.45 (0.38-0.52) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive LR; LR-, 
negative LR. 

a CI values on LRs were determined using the method of Simel et al. 24 
b Scintigraphic span of 16.5 cm or more; this reflects an arbitrary interpretation 
based on the bigger-than-usual span among clinically normal persons reported by 
Peternel et al. 14 

'Scintigraphic MCL span of 15.5 cm or more. 

"Volume greater than 1100 mL7m 2 , in which a volume of 900 mL/m 2 usually signi¬ 
fies enlargement. 


Palpation for an expansile liver edge should be limited to cases 
of suspected tricuspid valve disease or constrictive pericarditis. 

The other qualitative characteristics are consistency, nodu¬ 
larity, and tenderness of a palpable liver edge. Among multiple 
expert observers examining variously alcoholic or jaundiced 
patients, K statistics for chance-corrected agreement were 11% 
for abnormal consistency of a palpable liver edge 15 and 26% 15 
or 29% 33 for presence of nodularity. Only agreement on ten¬ 
derness of the liver edge was within a useful range, at 49%. 33 

Palpation to describe the liver edge qualitatively or to 
detect isolated enlargement of the left (caudate) lobe of the 
liver should therefore be considered primarily if there is 
other evidence of organ disease or concern about liver tumor 
and even then is optional. 

ASSESSING VERTICAL LIVER SPAN 

Unequivocal reduction in liver size should be detectable in fulmi¬ 
nant hepatic failure. However, no evidence was located to sup¬ 
port the common belief that a substantial proportion of persons 
with chronic cirrhosis have detectably small livers by physical 
examination. The focus herein is accordingly on hepatomegaly. 

Because half of all palpable livers are not enlarged, measure¬ 
ment of vertical liver span in some plane is required. The usual 
reference point is the MCL. However, unless care is taken in 
examination, the MCL can be “a wandering landmark,” with 
documented interobserver variation as much as 10 cm. 34 
Variation in the MCL will inevitably lead to imprecision in 
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Table 22-4 Match of 
Imaged Span 

Clinically Measured Midclavicular Line Span and 

Authors and 
Procedure 

No. (%) of 
Patients With 
MCL Proportion 
Within 2 cm/No. 
of Total Patients 

No. of 
Observers 

Imaging Method 

Sullivan et al 19 

Scratch test 

15/36(42) 

1 

Scintigraphy, MCL, 

Percussion alone 

19/47(40) 


not matched 

Palpation alone 
(where applicable) 

17/32(53) 



Fuller et al 36 

Palpation or 
percussion and 
scratch test 

31/40 (78) 

3 or 4 

Ultrasonographic 
MCL matched 3 

Palpation or 
percussion alone 

16/36(44) 



Peternel et al 14 

Percussion and 
palpation 

18/43(42) 

2 or 3 b 

Scintiscan, MCL 
not matched 

Naylor et al 35 

Percussion or palpation 

Observer 1 

20/39 (51) 

1 

Scintiscan, MCL 
matched 

Observer 2 

13/34 (38) 

1 

Scintiscan, MCL 
matched 


Abbreviation: MCL, midclavicular line. 

“Span from costal margin only. 

“Mean MCL span used when observers' results differed. 


liver span assessments (Figure 22-2). Vertical span could be an 
accurate predictor of liver mass only if the organ were more or 
less cuboid rather than irregular. 

Palpation should, in theory, be the most reliable and accu¬ 
rate method of locating the lower border of the liver to meas¬ 
ure organ span. Two studies 12,33 report specialists’ ability to 
agree on distance from the costal margin to a palpable liver 
edge, an approach that overstates accuracy by eliminating the 
largest source of error location of the upper border of the 
liver. 14,19,35 Meyhoff et al 12 further controlled interobserver dis¬ 
agreement by having all measurements made at a predeter¬ 
mined MCL. Mean maximum interobserver difference of 
distance from the costal margin was 6.1 cm (SD, 2.7 cm) in the 
MCL. Intraobserver variation was smaller, with differences not 
greater than 2 cm in 60% to 80% of MCL measurements. 
There was no clear relationship to liver size, a finding that 
underscores the need to measure span from the upper border 
of the liver, not the costal margin. Theodossi et al 33 performed 
a similar experiment without marking the MCL. The intraclass 
correlation coefficient was 0.66, analogous to a weighted K of 
more than 60%. However, agreement beyond chance on 
whether the liver was truly enlarged was only 30%. 

What are the alternatives to localizing the lower liver edge by 
palpation? The scratch test is performed by placing the dia¬ 
phragm of the stethoscope at the xiphisternum or over the 


liver just above the costal margin in the MCL. Starting low in 
the abdomen, a finger is moved up the abdomen, scratching 
gently. The intensity becomes greatly enhanced once the finger 
is over the lower border of the liver. 36 The other major alterna¬ 
tive is percussion. 

Comparative studies are summarized in Table 22-4. 19,36 Two 
caveats are in order. Both studies involved limited numbers of 
observers and patients. Furthermore, the overall accuracy in 
the report by Fuller et al 36 is greatly exaggerated on 2 scores. 
First, the ultrasonographic measurement was made in a plane 
defined by the observers. In actual practice, the MCL of the 
clinical observer varies from that of the scintigrapher or ultra- 
sonographer, 34,35 a situation that was applicable for the patients 
examined by Sullivan et al. 19 Second, Fuller et al 36 took their 
measurements from the costal margin. 

The scratch test may be a useful adjunct to percussion or 
palpation in locating the lower edge of the liver. However, 
more studies are needed before it can be recommended for 
routine use. 

Also shown in Table 22-4 are the results of other studies in 
which the authors used percussion or palpation to locate the 
lower liver edge. Excluded is one outlying study in which 100% 
of measurements were accurate within 2 cm of scintigraphic 
MCL span and exact agreement at the 0.1-cm level is claimed 
for several observations. 17 We also exclude a study using direct 
percussion without a pleximeter finger 37 ; this study related 
mean clinical liver span to ultrasonographic span but lacked 
measures of either case-by-case absolute span discrepancies or 
categorical agreement on organ normalcy. 

Once the span has been determined, clinicians must still 
decide whether the liver is enlarged or not. Blendis et al 38 
reported that among 28 patients with blood dyscrasias or liver 
diseases examined by 4 observers, 3 of 4 observers agreed in 93% 
of cases about the presence or absence of hepatic enlargement, 
but the data do not permit a K correction. Theodossi et al, 33 with 
5 observers and a structured medical history and physical exami¬ 
nation on 20 jaundiced patients, reported a K for presence or 
absence of hepatomegaly of 30%. Moreover, agreement among 
the qualitative judgments of clinicians and an external reference 
standard is modest. For example, Blendis et al 38 found that in 
the cases in which at least 3 clinicians agreed on hepatomegaly, 
concordant assessments of radiologic liver surface area were 
found in only 48% of cases. Halpern et al 20 compared judgments 
recorded in medical charts with a convenience sample of 214 
scintigraphic images with 16 cm as the cut point. Accuracy was 
66%, slightly higher than in the study by Blendis et al. 38 However, 
when corrected for agreement expected according to chance 
alone, the resulting K statistic was only 32%. Naylor et al 35 used 15 
cm as a cut point for scintigraphic hepatomegaly and, with 2 
observers, found that the accuracy of clinical examination ranged 
from 67% to 82%, depending on the observer and choice of clin¬ 
ical threshold value for determining the presence of hepatomeg¬ 
aly. Correcting for chance agreement, the K statistics ranged from 
28% to 55%. Overall, it appears that combinations of palpation 
and percussion yield modest accuracy greater than expected by 
chance alone in determining whether the liver is enlarged or not. 

Castell et al 39 suggested measuring span by percussion alone. 
They examined 116 healthy subjects to establish a range of nor- 
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mal for percussive span in the MCL and midsternal line. 
Because the goal was to establish a clinical range of normal, 
there was no reason to validate the measurements against a ref¬ 
erence standard. Percussive span correlated positively with 
height and differed between men and women, as would be 
expected from autopsy studies (Table 22-1). Formulas to predict 
span were derived that incorporated height and weight. The 
MCL liver dullness for men (cm) = {[0.032 x weight (lb)] + 
[0.18 x height (in)]} - 7.86 and MCL liver dullness for women 
(cm) = {[0.027 x weight (lb)] + [0.22 x height (in)]} - 10.75. 39 

The advantages of percussion alone are that observers may 
not agree on the presence of a palpable liver, and palpable livers 
will often be felt below the point at which the percussion note 
changes. The latter occurs because the thin lower liver edge may 
not cause dullness. Thus, you must rely on palpation in a vari¬ 
able proportion of subjects because not all livers are palpable, 
and these subjects will tend to have somewhat larger liver spans. 
However, clinical MCL span compared with technetium scinti¬ 
graphic span is less accurate when the lower border is nonpalpa- 
ble, 14,19 and errors are always greatest in the upper border that 
can only be approached by percussion. 14,19,35 It therefore seems 
counterintuitive to propose examining liver span by percussion 
alone. Also, the forcefulness of percussion greatly modifies the 
measured span. 19,39,40 Use of percussive span therefore demands 
that each observer double check his or her own range of normal 
against the established norms to ensure that strength of percus¬ 
sion is not a confounder. 

Another group used the percussive span technique 41 to 
examine 46 patients with liver disease. There was significant 
disagreement among the 6 examining clinicians, presumably 
because of strength and plane of percussion. Interobserver 
agreement on the appraisal of the organ as “small,” “normal,” 
or “enlarged” was excellent only for massively enlarged livers. 
If moderately enlarged organs (ultrasonographic volumes 
between 2000 and 2700 mL) are included, the probability of 
any 2 randomly chosen observers agreeing on the presence of 
hepatomegaly was between 40% and 75%. 

This limited performance is perhaps understandable 
because the concept of percussive span rests on the question¬ 
able assumption that it consistently underestimates liver 
span, allowing for reliable demarcation of abnormally sized 
livers. Nonetheless, Castell et al 39 are the only group to estab¬ 
lish a range of normal for clinical liver span that reflects the 
known variability of span with height, weight, and sex. 

Use of percussion alone to determine span, independent of 
whether or where the lower liver edge is felt, remains feasible. 
However, clinicians should standardize their percussion tech¬ 
nique and compare their typical findings in normal subjects 
with published normal ranges. Future research should evaluate 
the clinical use of the percussive method compared with meth¬ 
ods using percussion and palpation, with and without the 
“scratch test.” 

PHYSICAL FINDINGS IN CONTEXT 

In the foregoing studies, accuracy is generally defined 
against a single reference standard such as ultrasonography 


or scintigraphy. This procedure contrasts with studies such 
as the one by Rosenfield et al, 22 in which measured span and 
palpability were compared with evidence in the clinical 
record for any liver disease. The latter study has the advan¬ 
tage of capturing the fact that although all truly enlarged liv¬ 
ers are diseased, not all normal-sized livers are free of disease. 

A further problem with many studies is the extent of blind¬ 
ing. Some studies blind observers to all details of the patients’ 
medical history and other physical findings. Others ask 
observers to perform a structured medical history and physi¬ 
cal examination or set inclusion criteria (eg, jaundiced or 
alcoholic patients) that will affect clinicians’ judgments. The 
nature and extent of confounding from this variable are 
unknown, but it seems probable that the extent of interob¬ 
server agreement, and even the match between clinical judg¬ 
ments and reference standards, will be affected by the 
amount of information available to the examiner. 

Finally, few studies try to place liver findings in the overall 
context of clinical decision making. Sapira 1 observed that 
clinical liver span assessments need not match closely ultra¬ 
sonographic or scintigraphic measures because the “clinical 
worth” of a sign is its potential contribution to clinical deci¬ 
sion making. Of interest, Espinoza et al 15 used stepwise dis¬ 
criminant analysis to assess the ability of a variety of physical 
findings to distinguish among 50 consecutive alcoholic 
patients presenting variously with cirrhosis, noncirrhotic 
alcoholic liver disease, or no clinical/biochemical evidence of 
liver disease. Three variables—spider nevi, splenomegaly, 
and abdominal wall collateral veins—appeared useful; liver 
examination findings were not significant contributors to the 
differential diagnostic exercise. Similarly, Theodossi et al 42 
and Theodossi 43 examined the ability of a large array of 
symptoms and signs to differentiate between medical and 
surgical causes of jaundice. They found that descent of the 
liver edge greater than 2 cm below the costal margin was 
more common with surgical causes of jaundice (P < .01), but 
the independent contribution of this sign to the overall diag¬ 
nostic process was unclear. 

Both studies started with populations that had liver disease 
and determined whether physical diagnosis helped in catego¬ 
rizing the type of disease. Neither addresses whether the 
physical examination was helpful in deciding which patients 
had liver disease in the first place. Little is known about the 
real contribution of liver examination findings to the overall 
clinical diagnostic and management process. This topic 
should be a research priority. 

WHAT CAN YOU DO TO GET BETTER 
AT EXAMINING THE LIVER? 

No educational studies, to our knowledge, have tested meth¬ 
ods to improve your accuracy and precision in examining the 
liver, but a few suggestions can be hazarded. First, once you 
are comfortable examining the liver, pursue the various 
shortcuts recommended herein. Early on, however, it is use¬ 
ful to check the liver span by percussion, even in persons 
with a low probability of liver disease and a nonpalpable 
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organ. This method can help you begin to understand what 
your own range of normal is likely to be. Second, check your 
reliability by reexamining stable patients and comparing 
your follow-up assessment with your first impressions. 
Third, both learners and experts should quantitatively and 
qualitatively benchmark their physical examinations of the 
liver against findings on nuclear examination or ultrasonog¬ 
raphy. Try to determine how you are doing in assessing verti¬ 
cal liver span or extent of descent of the edge below the costal 
margin or in “calling” the presence of hepatomegaly. Fourth, 
consider the potential errors in locating the MCL. If sequen¬ 
tial clinical span assessments are being made (eg, fulminant 
hepatic failure or treatment of hepatic metastases), it may 
help to record a reference plane such as 10 cm from the mid¬ 
line or where the lateral edge of the rectus abdominis crosses 
the costal margin. 44 

THE BOTTOM LINE 

Once historical data and other physical signs have been elic¬ 
ited, the additional value of a detailed physical examination 
of the liver remains uncertain. Moreover, just as diagnostic 
tests yield little at the extremes of prior probability so also 
would you expect less yield from liver examination in per¬ 
sons who are not suspected of having liver disease or who 
obviously have some hepatobiliary complaint. 

A selective approach to physical examination of the liver 
is therefore suggested. Palpate to locate the lower liver bor¬ 
der in the MCL in situations of low probability of liver dis¬ 
ease. If the liver is not palpable, one can defensibly forgo 
any further examination in patients without reasons to sus¬ 
pect liver disease. However, because palpation of the abdo¬ 
men is difficult in some subjects, light percussion remains 
an option to confirm lack of extension of the liver edge 
below the costal margin or guide further palpation. With a 
palpable lower edge, MCL span can be ascertained by light 
percussion of the upper border. A span of less than 13 cm 
reduces the probability of hepatomegaly. In persons with an 
impalpable liver and a high probability of liver disease, 
measuring span by percussion alone may also be worth¬ 
while; tables of norms have been published, although these 
apply to moderate or heavy percussion methods. Palpation 
specifically to assess the quality of the liver edge is recom¬ 
mended only if there are signs of liver disease, including 
unequivocal hepatomegaly. Auscultation over the liver has a 
limited role in examination. 
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UPDATE: Hepatomegaly 



Prepared by David L. Simel, MD, MHS 
Reviewed by Marisa D'Silva, MD 


CLINICAL SCENARIO 


A 21-year-old college student with a flulike illness for 2 
days presents to the student health clinic. You suspect 
influenza and examine her oropharynx, neck, chest, and 
abdomen. When you feel her liver edge about 2 cm below 
the costal margin, you inquire about abdominal discom¬ 
fort, nausea, vomiting, and anorexia other than with the 
current illness. Her skin is not jaundiced. She has no his¬ 
tory of liver disease or illnesses associated with liver 
enlargement. You reexamine the abdomen to confirm the 
presence of the liver edge and additionally find no evi¬ 
dence for splenomegaly. 

UPDATED SUMMARY ON HEPATOMEGALY 

Original Review 

Naylor CD. The rational clinical examination: physical exam¬ 
ination of the liver. JAMA. 1994;271 (23): 1859-1865. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the sub¬ 
ject “hepatomegaly/di,” published in English from 1993 to 
2004. The results yielded 71 titles, for which we reviewed the 
titles and abstracts; 13 were selected for additional review. 
These articles were reviewed to identify articles that assessed 
the sensitivity and specificity of the medical history or physi¬ 
cal examination for hepatomegaly. Two articles were identi¬ 
fied for inclusion. 

NEW FINDINGS 

• Clinicians should stop assessing the liver span by “scratch¬ 
ing” the abdomen. 

Details of the Update 

A nonsystematic review was published at about the same time as 
the original Rational Clinical Examination article. 1 The conclu¬ 


sions in the 2 articles were similar in observing that palpation of 
the liver edge occurs commonly in healthy patients. 

The scratch test was suggested as a method for determining 
the distance below the right costal margin. A study with 
methodologic flaws, which should have enhanced the accu¬ 
racy of this method, found no correlation between the dis¬ 
tance of the edge below the costal margin and the total liver 
span identified by ultrasonography. 2 

A second study confirmed the relationship between the 
liver edge identified by percussion and liver span (confirmed 
by ultrasonography), but only in patients with cirrhosis. 3 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The original publication did not use meta-analytic techniques 
for assessing the pooled likelihood ratios (LRs) for palpating the 
liver edge. The data were reanalyzed, and they showed that the 
presence of a palpated liver edge is not as good as previously 
reported (LR, 2.0 vs previously reported 2.5) for identifying a 
patient with an increased liver span or volume (see ble22 ). 

CHANGES IN THE REFERENCE STANDARD 

None. 


RESULTS OF LITERATURE REVIEW 

The scratch test for identifying the edge of the liver below the 
costal margin yields a result with no correlation to the actual 
liver span (r = 0.04). 

Percussion of the liver in cirrhotic patients agrees with the 
total liver span measured by ultrasonography (k = 0.93). 


Table 22-5 Likelihood Ratio for a Palpable Liver Edge to 

Identify Hepatomegaly 



Finding (No. of Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

Palpable liver edge (3) 

2.0 (1.5-2.8) 

0.41 (0.3-0.55) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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EVIDENCE FROM GUIDELINES 

There are no guidelines addressing an assessment for hepato¬ 
megaly in the general population. 


CLINICAL SCENARIO—RESOLUTION 


Your previous suspicion of liver disease is low, and identi¬ 
fying the liver edge in this young patient is likely a normal 
finding. You should consider screening for excessive alco¬ 
hol use because she could have a fatty liver unrelated to 
the current illness. Although you might consider infec¬ 
tious mononucleosis as the current underlying illness, 
hepatomegaly is not a common presentation (as opposed 
to splenomegaly). Additional testing for liver enlargement 
is not indicated unless there are other suggestions that she 
might have liver disease. 


HEPATOMEGALY— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The probability that the liver edge can be felt below the right 
costal margin is about 50%. However, this does not correlate 
with the liver span in normal patients. Thus, the prior prob¬ 
ability of hepatomegaly depends entirely on the possible 
underlying disease states. 

POPULATION FOR WHOM HEPATOMEGALY 
SHOULD BE CONSIDERED 

• Known or suspected liver disorders 

• Malignancy 

• Congestive heart failure 


DETECTING THE LIKELIHOOD OF HEPATOMEGALY 

Palpating a liver edge below the right costal margin corre¬ 
lates poorly with the actual liver span, although it does 
increase the likelihood that the patient will have an enlarged 
liver (positive LR, 2.0). Likewise, the failure to identify the 
liver edge does not rule out the presence of an increased liver 
span (negative LR, 0.41). The effect of this finding depends 
on the previous suspicion of liver disease. 

When there is a suspicion of liver disease, we recommend 
that clinicians forgo the “scratch” test and use percussion to 
estimate the liver span (>15 cm = enlargement). Liver ultra¬ 
sonography will be required to confirm the clinical findings. 

REFERENCE STANDARD TESTS 

Ultrasonography or scintigrams. 


REFERENCES FOR THE UPDATE 

1. Meidl EJ, Ende J. Evaluation of liver size by physical examination. / Gen 
Intern Med. 1993;8(ll):635-637. 

2. Tucker WN, Saab S, Rickman LS, Mathews WC. The scratch test is unre¬ 
liable for detecting the liver edge. J Clin Gastroenterol. 1997;25(2):410- 
414. a 


3. Zoli M, Magalotti D, Grimaldi M, Gueli C, Marchesini G, Pisi E. Physical 
examination of the liver: is it worth it? Am } Gastroenterol. 1995;90(9): 
1428-1432. 3 


a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 









EVIDENCE TO SUPPORT THE UPDATE 


Hepatomegaly 



TITLE The Scratch Test Is Unreliable for Detecting the 
Liver Edge. 

AUTHORS Tucker WN, Saab S, Rickman LS, Mathews 
WC. 

CITATION / Clin Gastroenterol. 1997;25(2):410-414. 

QUESTION What is the interobserver variability of the 
scratch test for measuring the liver span below the right 
costal margin? 

DESIGN Multiple independent examinations. The ultra¬ 
sonography was performed before the physical examina¬ 
tion and thus was blinded to the scratch test. It is not clear 
whether the examiners knew the results of the ultrasonog¬ 
raphy. 

SETTING Patients were identified from a list of those 
undergoing abdominal ultrasonography. The examin¬ 
ers included attending physicians (2), a gastroenterolo¬ 
gist and an infectious disease specialist, gastroenterology 
fellows (2), chief medical residents (2), a medicine resi¬ 
dent, senior medical students (3), and a nurse practi¬ 
tioner. 

PATIENTS Inpatients (n = 22). Most patients had nor¬ 
mal body habitus, although 3 had ascites and 1 was 
greater than ideal body weight. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A scratch test with the results recorded as the location of the 
liver edge in centimeters below the right costal margin. 


MAIN OUTCOME MEASURE 

Liver span measured by ultrasonography. 

MAIN RESULTS 

Of 22 patients, 18 (80%) had hepatomegaly (liver span > 15.5 
cm). There was no correlation in the ultrasonographically 
measured liver span and the span of the liver below the right 
costal margin by the scratch test (r = 0.05). The pairwise reli¬ 
ability coefficient ranged from -0.32 to 0.74, with a mean of 
0.26 (poor correlation). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Many examiners with different training levels. 

LIMITATIONS There is no certainty that the examiners were 
blinded to the ultrasonographic results. However, if they 
knew the results of the ultrasonography, the bias would have 
been toward an improved correlation. There was a high prev¬ 
alence of patients with known liver disease, although this is 
the population for whom measuring the liver span would be 
most applicable. 

The biases in this study should have enhanced the correla¬ 
tion between scratch test determination of the liver edge and 
the actual liver edge by ultrasonography. Despite the signifi¬ 
cant limitations in study population, there was no correla¬ 
tion. From this study, we can conclude that physicians should 
stop scratching the abdomen to identify the liver edge for 
patients with suspected liver disease. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Physical Examination of the Liver: Is It Still 
Worth It? 

AUTHORS Zoli M, Magalotti D, Grimaldi M, Gueli C, 
Marchesini G, Pisi E. 

CITATION Am } Gastroenterol. 1995;90(9):1428-1432. 

QUESTION What is the correlation between palpating 
the lower liver edge and the actual size of the liver? 

DESIGN Independent, prospective, nonconsecutive 
evaluation of case patients and control patients. 

SETTING Gastroenterology clinic. The examiners were 
one of 2 clinical hepatologists. 

PATIENTS Cases were patients with cirrhosis and con¬ 
trol patients were healthy. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Total liver span in the midclavicular line and the presence of 
a palpable liver below the right costal margin. The examiners 
used percussion to identify the liver edge. 

MAIN OUTCOME MEASURE 

Palpation was compared with ultrasonography results. The 
clinicians were not informed of whether the patient was a 
case or a control. The ultrasonographer did not have the 
physical examination results. 

MAIN RESULTS 

Forty-seven of 100 control patients had a liver edge palpated 
below the costal margin vs 78 of 100 patients with cirrhosis. 

When the liver edge was palpated below the right costal 
margin, the likelihood that the ultrasonography would iden¬ 
tify the edge below the margin was 39 (95% confidence inter¬ 
val [Cl], 4.6-373). When the edge was not palpated, the 
likelihood ratio (LR) for ultrasonography identifying the 
edge below the margin was 0.28 (95% Cl, 0.22-0.36). 

The investigators compared the measured distance from the 
costal margin to the liver edge vs the distance measured with 


ultrasonography. There was poor agreement for control sub¬ 
jects (k = 0.13) and excellent agreement for cirrhotic patients 
(k = 0.93). 

However, identifying the edge below the margin is not the 
same as identifying a large liver. The liver span was more 
than 15 cm in 18 control and 8 case patients. Among patients 
with cirrhosis, the clinical estimation correlated with the 
liver span (r = 0.59). There was no statistically significant 
correlation for control subjects (r = 0.20). The investigators 
also compared liver span by palpation to liver volume by 
ultrasonography; the correlation was good for case patients 
(r = 0.65 for cirrhosis patients) but not for healthy patients (r 
= 0.33). The same correlation results were found when liver 
span below the costal margin was compared with the liver 
volume. 

Data are presented that allow us to estimate the sensitivity 
of the pattern of the liver margin as an indicator for cirrhosis. 
The positive LR for a thickened liver margin for cirrhosis is 
2.4 (95 % Cl, 1.4-4.4). The LR for finding a sharp edge is 0.62 
(95% Cl, 0.46-0.81). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Independent comparison, although the exam¬ 
iners likely knew from clinical observations that some of the 
case patients had cirrhosis. The interobserver variability of 
the ultrasonographers was reported as 3 mm or less. 

LIMITATIONS Nonconsecutive patients in whom the exam¬ 
iners knew that half of the patients had cirrhosis and others 
did not. The data are not presented in a fashion that allows us 
to extract the sensitivity and specificity for percussion of the 
liver edge. 

A large number of healthy patients will have their liver 
edge palpated below the right costal margin, which indicates 
neither disease nor the presence of hepatic enlargement. 
These results support the suggestion that physicians should 
specifically assess the liver edge only when there is a suspi¬ 
cion of liver disease. 

The study design (high prevalence of cirrhosis) does not 
allow us to extrapolate the results for the pattern of the liver 
edge to a population of patients with a suspicion of other 
liver diseases (“thick” indicating cirrhosis and “sharp” indi¬ 
cating normal liver). 

Reviewed by David L. Simel, MD, MHS 
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CLINICAL SCENARIO 


Does This Patient Have 

Hypertension? 
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Is This Patient’s Blood Pressure Really Elevated? 

A 46-year-old man who has recently moved to your 
neighborhood presents with a painful ankle sprain. Before 
he leaves, you decide to check his blood pressure (BP) and 
obtain an initial reading of 164/102 mm Hg. He denies 
having high BP previously. 


WHY IS ACCURATE BP 
MEASUREMENT IMPORTANT? 


Elevated arterial BP, or hypertension, is important because it 
is common, it is clinically silent, it leads to cardiovascular 
disease (CVD), and it decreases life expectancy. Because sur¬ 
veys find that approximately 20% 1-3 of North American 
adults have an elevated BP (systolic BP [SBP] > 140 mm Hg 
or diastolic BP [DBP] > 90 mm Hg) or are taking antihyper¬ 
tensive medication, physicians are advised to check all 
patients periodically for BP elevation. 3 ' 7 On the other hand, 
overestimation of BP can erroneously label people as hyper¬ 
tensive and potentially result in unnecessary dietary restric¬ 
tions, exposure to adverse effects from drug treatment, 
medication expense, and adverse socioeconomic effects. 8,9 
Fortunately, measuring BP is an easy and safe diagnostic pro¬ 
cedure that, when followed by appropriate antihypertensive 
drug treatment, can lead to reduced CVD and mortality. 10,11 

STANDARDS FOR MEASURING BP 

The gold standard for instantaneous BP measurement is the 
intra-arterial or direct BP (determined by a rigid-walled 
catheter). The standard for clinical practice is the so-called 
casual cuff or indirect BP. 

Guidelines for Diagnosing Hypertension 

Cardiovascular disease risk increases monotonically with BP, 
revealing no cut point below which risk is minimal and above 
which CVD will definitely occur. Terms used to indicate the 
degree of BP elevation now emphasize the importance of what 
was previously termed mild hypertension and the long-recog¬ 
nized greater predictive value of elevated SBP 12 (Table 23-1). Risk 
for future CVD is predicted by even a single careful BP reading. 1315 
However, BP is rather variable and often decreases with observa¬ 
tion so that, in accord with statistical expectations, risk relates 
more closely to mean BP during several visits 13 (although brief, 
severe BP elevation can also be catastrophic, eg, with cocaine 
overdose). Therefore, we could define the “treatable BP level” as 
that mean clinical BP above which treatment has been shown in 
randomized controlled trials to do more good than harm. The 
largest of these trials used drug treatment vs placebo after find¬ 
ing a consistent or average entry BP from 2 to 3 visits of greater 
than or equal to 160 mm Hg SBP (tested only in the elderly) with 
of use. 
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or without DBP elevation, 16 or greater than or equal to 90 
mm Hg DBP (tested in the young and in the elderly). 11 In the 
future, individualized assessments of absolute risk incorporating 


Table 23-1 Classification of Blood Pressure for Adults Aged 18 Years 
and Older 3 

Category 

Systolic, mm Hg 

Diastolic, mm Hg 

Normal 

<130 

<85 

High normal 

130-139 

85-89 

Hypertension 11 

Stage 1 (mild) 

140-159 

90-99 

Stage 2 (moderate) 

160-179 

100-109 

Stage 3 (severe) 

180-209 

110-119 

Stage 4 (very severe) 

>210 

>120 


“Adapted from the fifth report of the Joint National Committee on Detection, Evalua¬ 
tion, and Treatment of High Blood Pressure. 3 

“Based on the average of 2 or more readings taken at each of 2 or more visits after 
an initial screening. 


other relevant information, such as age, sex, concomitant risk 
factors, and coexisting target organ damage, along with the 
patient’s tolerance for risk and history of drug adverse effects, 
may replace arbitrary cut points in determining when BP eleva¬ 
tion becomes treatable. 7 At present, a diagnosis of hypertension 
reflects a consensus regarding the office BP level above which 
CVD risk worsens significantly, about 140/90 mm Hg. 

A detailed conceptual analysis of hypertension is beyond 
the scope of this article but has been addressed thoughtfully 
by Jennings and Netsky. 17 

How to Measure Clinical BP 

Meticulous technique in indirect auscultatory BP measure¬ 
ment is mandatory for research, diagnosis, and optimal clini¬ 
cal care of hypertensive patients. Published procedural 
guidelines show general uniformity that, if followed, should 
improve the accuracy and reliability of BP measurement 
(Table 23-2; Figure 23- 1). 3 ' 5,18 ' 20 BP is customarily measured 
after obtaining the medical history as part of the “vital sign” 


Table 23-2 Techniques for Measuring Blood Pressure 3 

The intent and purpose of the measurement should be explained to the subject in a reassuring manner and every effort made to put the subject at ease [includ¬ 
ing a 5-min rest before the first measurement]. 

The sequential steps for measuring BP in the upper extremity, as for routine screening and monitoring purposes, should include the following: 

1. Have paper and pen at hand for immediate recording of the pressure. 

2. Seat the subject in a quiet, calm environment [with feet flat on the floor, back against the chair] with his or her bared arm resting on a standard table or other 
support so the midpoint of the upper arm is at the level of the heart. 

3. Estimate by inspection or measure with a tape the circumference of the bare upper arm at the midpoint between the acromion and olecranon process and 
select an appropriately sized cuff. The bladder inside the cuff should encircle 80% of the arm in adults and 100% of the arm in children < 13 years. If in 
doubt, use a larger cuff. If the available cuff is too small, this should be noted. 

4. Palpate the brachial artery and place the cuff so that the midline of the bladder is over the arterial pulsation and then wrap and secure the cuff snugly around the 
subject's bare upper arm. Avoid rolling up the sleeve in such a manner that it forms a tight tourniquet around the upper arm. Loose application of the cuff results in 
overestimation of the pressure. The lower edge of the cuff should be 1 in [2 cm] above the antecubital fossa where the head of the stethoscope is to be placed. 

5. Place the manometer so the center of the mercury column or aneroid dial is at eye level [except for tilted-column floor models] and easily visible to the 
observer and the tubing from the cuff is unobstructed. 

6. Inflate the cuff rapidly to 70 mm Hg, and increase by 10 mm Hg increments while palpating the radial pulse. Note the level of pressure at which the pulse disap¬ 
pears and subsequently reappears during deflation. This procedure, the palpatory method, provides a necessary preliminary approximation of the SBP to ensure 
an adequate level of inflation when the actual, auscultatory measurement is made. The palpatory method is particularly useful to avoid underinflation of the cuff 
in patients with an auscultatory gap and overinflation in those with very low BP. 

7. Place the earpieces of the stethoscope into the ear canals, angled forward to fit snugly. Switch the stethoscope head to the low-frequency position (bell). The 
setting can be confirmed by listening as the stethoscope head [ie, the bell orifice] is tapped gently. 

8. Place the head of the stethoscope over the branchial artery pulsation, just above and medial to the antecubital fossa but below the edge of the cuff, and hold it firmly 
[but not too tightly 21 ] in place, making sure that the head makes contact with the skin around its entire circumference. Wedging the head of the stethoscope under 
the edge of the cuff may free up one hand but results in considerable extraneous noise [and is nearly impossible with the bell in any event], 

9. Inflate the bladder rapidly and steadily to a pressure 20-30 mm Hg above the level previously determined by palpation and then partially unscrew [open] the 
valve and deflate the bladder at 2 mm [Hg]/s while listening for the appearance of the Korotkoff sounds. 

10. As the pressure in the bladder decreases, note the level of the pressure on the manometer at the first appearance of repetitive sounds [phase I] and at the 
muffling of these sounds [phase IV] and when they disappear [phase V]. During the period the Korotkoff sounds are audible, the rate of deflation should be 
no more than 2 mm per pulse beat, thereby compensating for both rapid and slow heart rates. 

11. After the Korotkoff sound is heard, the cuff should be deflated slowly for at least another 10 mm Hg to ensure that no further sounds are audible and then rapidly 
and completely deflated, and the subject should be allowed to rest for at least 30 s. 

12. The systolic [phase I] and diastolic [phase V] pressures should be immediately recorded, rounded off [upward] to the nearest 2 mm Hg. In children, and when sounds are 
heard nearly to a level of 0 mm Hg, the phase IV pressure should also be recorded [eg, 108/64/56 mm Hg], All values should be recorded together with the name of the 
subject, the date and time of the measurement, the arm on which the measurement was made, the subject’s position, and the cuff size [when a nonstandard size is used], 

13. The measurement should be repeated after at least 30 s and the 2 readings averaged. In clinical situations, additional measurements can be made in the same 
or opposite arm, in the same or an alternative position. 

Abbreviations: BP, blood pressure; SBP, systolic blood pressure. 

“Reproduced with permission from Perloff et al. 18 Copyright © 1993 American Heart Association. Text in brackets added by the author. 
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Figure 23-1 Clinical Measurement of Indirect Blood Pressure 

See Table 23-2 for appropriate cuff sizing. 


determination at the beginning of the physical examination. At 
each visit, 2 or more readings should be obtained and averaged 
from the same arm, with the patient supine or seated. As a 
practical approach to variability, taking additional readings 
until a stable level is reached has been suggested when the first 
2 differ by more than 5 mm Hg diastolic. 20 BP in both arms 
should be measured at the first visit, and the arm with the 
higher pressure should be used thereafter. 18 

Careful technique guarantees maximum accuracy. We have 
compiled information from a number of sources regarding 
factors that increase, decrease, or have no effect on BP 21 ' 74 
(Table 23-3). However, if all serious errors that can underes¬ 
timate BP are avoided, finding the BP in any setting, posi¬ 
tion, or time to be within the normal range makes a more 
careful measurement at that visit unlikely to be high. Assum¬ 
ing BP is checked routinely in all adults, the efficient practi¬ 
tioner can reasonably reserve the “proper” method for the 
10% to 20% of patients who have known or newly detected 
elevated BP (as in our clinical scenario), cardiovascular tar- 
get-organ damage, or other risk factors or who are receiving 
antihypertensive therapy. 

Variation in BP Measurement 

Sources of clinical variability include the patient, equipment, 
examiner, and procedure. For BP, a major proportion of ran¬ 
dom fluctuation over time arises from the examinee. Intra¬ 
arterial monitoring clearly reveals that SBP and DBP differ 
with every heartbeat and with the respiratory cycle. 45 Blood 
pressure also varies minute to minute, with a standard devia¬ 
tion of about 4 mm Hg systolic and 2 to 3 mm Hg dia¬ 
stolic, 59 - 75 as well as during hours 76,77 ; short-term variability in 
SBP is increased with impaired baroreflexes. 77 - 78 Day-to-day 
variation is even greater. With 2 or more cuff readings at each 
visit, the standard deviation between visits is approximately 5 
to 12 mm Hg systolic and 6 to 8 mm Hg diastolic. 13 - 59 - 60 - 79 - 80 


Table 23-3 Factors Affecting the Immediate Accuracy 
of Office Blood Pressure 


Magnitude, 

Factor SBP/DBP, mm Hg Reference 


Increases Recorded BP 

Examinee 

Soft Korotkoff sounds 

DBP 

Assumed 

Missed auscultatory gap 

DBP (rare, huge) 

22 

Pseudohypertension 

2-98/3-49 

23-25 

“White coat” reaction 

To physician 

11-28/3-15 

26-30 

To nonphysician 

1-12/2-7 

27, 31,32 

Paretic arm (caused by stroke) 

2/5 

33 

Pain, anxiety 

May be large 

22 

Acute smoking 

6/5 

34 

Acute caffeine 

11/5 

35 

Acute ethanol ingestion 

8/8 

36 

Distended bladder 

15/10 

37 

Talking, signing 

7/8 

38,39 

Setting, equipment 

Environmental noise 

DBP 

Assumed 

Leaky bulb valve 

>2 DBP 

40 

Blocked manometer vents 

2-10 

41 

Cold hands or stethoscope 

Not stated 

22 

Examiner 

Expectation bias 

Probably < 10 

In theory 

Impaired hearing 

DBP 

22 

Examination 

Cuff too narrow 

-8 to +10/2-8 

42-44 

Cuff not centered 

4/3 

45 

Cuff over clothing 

5-50 

46 

Elbow too low 

6 

47 

Cuff too loose 

Not stated 

48 

Too short rest period 

Varied estimates 

22 

Back unsupported 

6-10 

49, 50 

Arm unsupported 

1-7/5-11 

51 

Too slow deflation 

-1 to +2/5-6 

52, 53 

Too fast deflation 

DBP only 

52,53 

Parallax error 

2-4 

By author 

Using phase IV (adult) 

6 DBP 

45 

Too rapid remeasure 

1/1 

52,54 

Cold (vs warm) season 

6/3-10 

55-57 

Decreases BP 

Examinee 

Soft Korotkoff sounds 

SBP 

Assumed 

Recent meal 

-1 to 1/1-4 

58 

Missed auscultatory gap 

10-50 SBP 

45 

High stroke volume 

Phase V can = 0 

45 

Habituation 

0-7/2-12 

59-61 

Shock (additional pseudohypotension) 

33 SBP 

62 

Setting, equipment 

Noisy environs 

SBP 

Assumed 


(continued) 
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Table 23-3 Factors Affecting the Immediate Accuracy 
of Office Blood Pressure ( Continued) 


Factor 

Magnitude, 
SBP/DBP, mm Hg 

Reference 

Faulty aneroid device 

Can be > 10 

63 

Low mercury level 

Varies 

22 

Leaky bulb 

>2 SBP 

40 

Examiner 

Reading to next lowest 5-10 mm Hg, 
or expectation bias 

Probably <10 

64 

Impaired hearing 

SBP only 

22 

Examination 

Noisy environs 

SBP 

Assumed 

Left vs right arm 

1/1 

65 

Resting for too long (25 min) 

10/0 

66 

Elbow too high 

5/5 

47 

Too rapid deflation 

SBP only 

40 

Excess bell pressure 

>9 DBP 

21 

Parallax error (aneroid) 

2-4 

By author 

No Effect on BP 

Examinee 

Menstrual phase 


67, 68 

Chronic caffeine ingestion 


69 

Phenylephrine nasal spray 


70 

Cuff self-inflation 


71 

Examinee and examiner 

Discordance in sex or race 


72, 73 

Examination 

Thin shirtsleeve under cuff 


74 

Bell vs diaphragm 


49 

Cuff inflation per se 


29 

Hour of day (during work hours) 


54 

Room temperature 


54 


Abbreviations: BP, blood pressure; DBP, diastolic blood pressure; SBP, systolic blood 
pressure. 


This variability explains why 2 BP measurements in a patient 
often differ, but it also suggests that a repeated visit’s mea¬ 
surements could be as much as 15/12 mm Hg higher or lower 
than today’s result about 5% of the time. The greater magni¬ 
tude of the between-visit vs within-visit variability is why 
more visits are recommended to achieve greater diagnostic 
precision rather than more replications at a visit. In reality, 
the return BP reading in our clinical scenario will likely be 
lower, possibly much lower, because of our patient’s present 
distress, unfamiliarity with the physician and the physician’s 
office procedures, and “regression to the mean” (discussed 
herein). 

Arrhythmias, particularly atrial fibrillation, cause beat-to- 
beat cardiac output to vary substantially and increase inter¬ 
observer variation in measured BP. 81 With atrial fibrillation, 
probably the best one can do is to deflate the cuff slowly 
while attempting to ascertain when most of the contractions 
are resulting in audible Korotkoff sounds (the approximate 


SBP) and when the sounds have all but ceased yet still occur 
infrequently (the approximate DBP), or one can average sev¬ 
eral readings. 18,20 Because Korotkoff sounds generated by 
occasional premature beats (and the subsequent beat) are 
unrepresentative of the day’s mean BP level, they should be 
ignored. 22 

Examiners can introduce random errors. Under ideal condi¬ 
tions, simultaneous BP readings by independent observers typi¬ 
cally correlate above r = 0.95 with mean absolute differences of 
less than 2 mm Hg systolic and less than 1 mm Hg diastolic. 82 
However, even in research settings, careful BP readings obtained 
just a few minutes apart show distressingly high variation (eg, 
SD of 7 mm Hg systolic and 5 mm Hg diastolic 13,54,61 ). In routine 
medical practice, physicians and nurses often measure BP far 
less carefully: differences of 10/8 mm Hg are common. 83,84 White 
et al 85 performed intra-arterial BP recording in 48 hypertensive 
patients and found the humbling result that 2 auscultatory auto¬ 
matic monitors showed less overall discrepancy and fewer 
widely discrepant readings compared with those generated by 
experienced clinicians using the standard method. 

Environmental problems (eg, noise from construction 
work next door) or deficient equipment (eg, an inadequately 
damped, “bouncy” mercury column, remedied by tightening 
the knurled nut at the column’s top 22 ) may also be expected 
to decrease precision. 

Accuracy of BP Measurement 

Accuracy, or validity, refers to agreement with the truth and 
requires not only precision but also freedom from systematic 
error (ie, bias). In clinical BP measurement, we look through a 
series of dark glasses, further considered herein: (1) the indirect 
BP may not reflect the concurrent intra-arterial BP; (2) the cuff 
technique may be incorrectly performed; and (3) a perfectly exe¬ 
cuted indirect (or even direct) BP reading at a particular 
moment may not represent the patient’s average clinic BP or the 
average BP throughout the day’s activities, as addressed in the 
section on ambulatory BP monitoring. Finally, to interpret even 
a perfect BP reading requires consideration of the whole patient 
because factors other than BP strongly influence the risk for 
CVD events. 

Indirect BP vs Direct BP 

Indirect auscultatory BP correlates well with the simulta¬ 
neous intra-arterial value (r = 0.94-0.98). 86 However, the 
Korotkoff phase I sounds do not appear until 15 to 4 mm Hg 
below the direct SBP, whereas, at phase V, the sounds disap¬ 
pear 3 to 6 mm Hg above the true DBP in adults. 45,85,86 

If these technical differences applied equally to all patients, 
they would be merely academic; clinical importance arises 
when an individual patient possesses an unusual discrepancy. 
Such patients are often elderly (in which false elevation is 
termed “pseudohypertension” 23,24,87 ' 89 ) or obese, 46 but otherwise 
unexplained extreme false elevations in cuff BP may also 
occur. 90 Pseudohypertension might seem at first glance to be a 
variant of normal BP. However, most patients actually have 
chronic hypertension, 91 on which is superimposed a further 
false BP elevation. Although it has been claimed that pseudo¬ 
hypertension can be suspected in an older person if “Osier’s 
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sign” (while feeling the radial pulse, occlude the brachial artery 
by cuff inflation or by direct pressure using the other thumb; if 
the radial artery remains palpable as a firm “tube,” the sign is 
positive) is present, 24 the test’s usefulness remains debat- 
able. 25 ' 91,92 For example, in 65 geriatric patients unanimously 
classified “Osier positive” or “Osier negative” by 3 observers, 6 
other physicians demonstrated moderate intraobserver con¬ 
sistency (k = 0.49) and only modest interobserver agreement 
(k = 0.37). 93 Retaining “Osier-equivocal” patients in the study 
would almost certainly have further reduced agreement. Con¬ 
firming pseudohypertension requires an intra-arterial BP 
measurement; fortunately, the condition is uncommon, affect¬ 
ing less than 2% of an otherwise healthy elderly group. 25 

Technical Inaccuracies of Indirect BP 

Examiner biases include end-digit preference (ie, the tendency 
to overrecord certain numbers, particularly 0 and 5 64 ’ 75 ’ 94 ’ 95 ), 
recording lower values at critical diagnostic cut points 64 pre¬ 
sumably to avoid institution of long-term drug treatment, and 
probably other analogous unconscious processes (eg, “observ¬ 
ing” a hoped-for BP reduction consequent to instituting ther¬ 
apy). Physicians may also differ when labeling patients as 
hypertensive. A group of British general practitioners diag¬ 
nosed hypertension after only 1 BP measurement in 58% of 
patients despite previously agreeing to use 3 readings as part of 
the group’s uniform diagnostic criteria. 96 Contrary to their local 
expert guidelines, about 37% of German out-of-hospital 97 phy¬ 
sicians and British hospital clinicians 98 record phase IV (muf¬ 
fling) rather than the more accurate Korotkoff phase V. 
Perhaps the most common technical error is failure to use a 
sufficiently large cuff; indeed, in 1 survey, only 25% of primary 
care physicians even owned a large cuff. 63 

Interestingly, even when an automatic BP recorder is used, 
systematic differences between operators in the BP values 
obtained may remain, 99 suggesting differing examinee reac¬ 
tions to different examiners, as was seen in one careful study in 
children. 100 

Directional equipment errors can occur. Aneroid instruments 
often go out of adjustment, usually downward. 52 One survey 
found that 34% of practitioners used only aneroid units, of 
which 30% were off by 10 mm Hg or more. 63 A mercury unit 
can yield biased readings if the meniscus does not rest at 0 or if 
the mercury’s descent is impeded by clogged internal air vents. 41 
The stethoscope type seems relatively unimportant. 49 Although 
the recommended bell amplifies the Korotkoff sounds’ low fre¬ 
quencies in comparison with the diaphragm, 101 the risk of exert¬ 
ing excessive pressure and obtaining a falsely low DBP when 
using the bell 21 may outweigh the benefit of amplification, par¬ 
ticularly in thin patients (try a small bell with a rubber rim). 

Examination errors are legion (Table 23-3); most overesti¬ 
mate the true BP. Confirming an apparent difference 
between arms is not simple because it requires taking the 
averages of several alternating measurements from both sides 
or simultaneous measurements by 2 observers who then 
switch sides and remeasure. 102 

Office BP vs Usual BP 

Shortly after patients enter the office, their SBP decreases by 
several millimeters of mercury, whereas DBP remains rela¬ 


tively constant. 53 ' 59 ' 60 ’ 66 ' 100 ' 103 BP remains fairly steady through¬ 
out the customary working daytime hours, 54 decreases in the 
evening (ie, at home), 104,105 and finally decreases another 10% 
to 20% during sleep. 106 ’ 107 In some patients, BP in a physi¬ 
cian’s office is notably and consistently higher than daytime 
ambulatory BP. 

This phenomenon, termed office or white coat hyperten¬ 
sion, 108 can even occur during self-measurement of cuff BP 
in the presence of a physician. 26 Approximately 10% to 
40% of untreated and nominally borderline hypertensive 
patients show an appreciable white coat effect, 27,109 and 
many treated patients will also show differences of greater 
than 20/10 mm Hg. 109,110 The phenomenon may depend in 
part on patient factors such as sex, age, and BP level. 111 For 
example, one group of elderly patients showed an increase in 
BP of 17/7 mm Hg on entering the physician’s office; women 
showed a greater SBP increase than men. 28,112 Who wears the 
white coat seems to matter, because nurses (who, along with 
technicians, have generally performed the BP measurements 
used for entry to the large clinical trials) seem to evoke a 
smaller BP increase than physicians. 29,113 

THE ISSUE OF PREDICTION 

Blood Pressure Now vs Blood Pressure Later 

Systematic (and therefore at least partially predictable) 
changes in BP between visits occur for several reasons. As 
examinees (volunteers or patients) become more familiar 
with the examiner, environment, and procedure (including 
BP self-measurement 26 ), BP decreases by 0 to 7 mm Hg sys¬ 
tolic and 2 to 12 mm Hg diastolic. 59 ' 61 This habituation may 
be more marked in patients with anxiety trait. 114 An addi¬ 
tional and probably more important factor, 115 regression to 
the mean, represents the tendency for any unusually high (or 
low) reading to fall closer to the population mean when 
repeated. These phenomena are distinguishable from a true 
“placebo” effect because they can occur in the absence of pla¬ 
cebo treatment. 13,59 ’ 116 Some BP changes likely represent cur¬ 
rently unappreciated systematic influences; for example, a 
systematic reduction in BP of about 6 mm Hg occurs during 
warm vs cold seasons. 55 ' 57 

Major outcome trials of antihypertensive pharmacother¬ 
apy have used 2 to 3 BP readings taken at each of 2 or more 
visits not only to increase precision (by “averaging out” 
minute-to-minute and between-day random fluctuations) 
but also to partially control for regression to the mean and 
habituation. In practice, following the same multivisit proto¬ 
col helps ensure that published trial results will be applicable 
to individual patients. A further refinement, used naturally 
by many experienced clinicians, is to conduct further follow¬ 
up visits when the BP is hovering near a diagnostic cut point. 
Patients whose true values are far from this threshold (above 
or below) logically need fewer visits for confident classifica¬ 
tion. 117 In practice, the interval between visits should take 
into account both the BP level and the patient’s clinical sta¬ 
tus. The Joint National Committee 3 recommends remeasure¬ 
ment within 1 month for BP initially 160 to 179 mm Hg 
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systolic or 100 to 109 mm Hg diastolic (ie, stage 2), within 2 
months for stage 1, within 1 week for stage 3, and immedi¬ 
ately for stage 4. 

Relative Risk of Casual BP Elevation 
for Persistent Hypertension 

Given high random variation, how well does the finding of a 
single elevated BP predict later definite hypertension? Casual 
BP, particularly SBP at 1 visit, is predictive of later BP elevation 
in young men, 118 medical students, 119 adults, 13 and children. 120 
(Tracking correlations vary widely, eg, r = 0.2-0.7, depending 
on the population, technique, and follow-up interval.) In a 
large prospective study, 121 1 DBP reading of 90 mm Hg or 
higher predicted a later definite diagnosis of hypertension in 
69% of men and 49% of women; any BP elevation warrants 
careful follow-up. Looked at the other way, however, about 
one-third to one-half of subjects with initially elevated BP will 
ultimately prove not to have hypertension. In practice, regres¬ 
sion to the mean guarantees that many individuals with ini¬ 
tially elevated BP are actually normotensive. 75 For example, 
among subjects with 4 DBP measurements at 2 entry visits 
averaging between 95 and 104 mm Hg in a mild hypertension 
trial in Australia, 122 28% proved to have an average DBP of less 
than 90 mm Hg during the next 4 years while receiving pla¬ 
cebo. In a careful screening program, similar DBP reductions 
were observed in the 105 to 114 mm Hg stratum from the first 
to second screen, and approximately 10% of subjects with 
DBP greater than or equal to 115 mm Hg were normotensive 
(< 90 mm Hg) at the next visit. 116 Therefore, using the mean of 
several visits’ BP readings improves the ability to predict not 
only future hypertension 120 but also CVD sequelae. 13 Because 
he may be normotensive, the patient in our case scenario 
should not be told that he is hypertensive at this initial visit, 4 
but he should be carefully followed up. 

Is a High BP Value Ever Normal? 

In normotensive subjects, aerobic exercise, which is generally 
accepted to be good for health, causes SBP to increase mod¬ 
erately, whereas DBP changes little. 123,124 Because increased BP 
forms part of the “fight or flight” response, pain (eg, a lacer¬ 
ated finger) and other stresses (eg, pulmonary edema) pre¬ 
dictably increase BP, sometimes to extreme values. These 
reactive elevations of BP do not indicate the presence of 
“hypertension” if the BP returns to normal levels at rest. 

How Do I Improve My Technique? 

Checking one’s equipment periodically is mandatory to pre¬ 
serve accuracy. 19 Aneroid devices should be recalibrated 
according to the manufacturer’s recommendations. Although 
one can measure arm circumference in each patient to select 
an appropriately sized cuff, one can more efficiently mark the 
limit of arm circumference directly on each cuff by drawing a 
line in indelible ink at a distance from the free bladder end 
equal to twice the measured bladder width. 

Tape recordings can help standardize observers’ identifica¬ 
tion of Korotkoff sounds. 84,125 Alternatively, locate a 2-headed 


stethoscope (and a second set of ears attached to a willing 
expert brain) for hands-on training. Initial formal training 
in the technique of BP measurement is necessary, but in 
addition, periodic review of technique and retraining as 
needed are recommended. 4 Retraining can increase 
accuracy 83 but may be needed every 1 to 2 months for opti¬ 
mal effect, 126 a frequency probably practical only in research 
settings. Atrial fibrillation requires a modified technique 
(discussed earlier). When faced with soft Korotkoff sounds, 
have the subject elevate the arm and then open and close 
the fist several times; inflate the cuff, lower the arm (with 
further inflation as needed), and listen again. In this situa¬ 
tion, as permitted by some guidelines, 4 more rapid deflation 
after determination of the SBP until the vicinity of the DBP 
will minimize attenuation of the Korotkoff sounds arising 
from venous congestion without altering the measured BP. 53 
Applying the cuff with its tubing emerging at the top 19 will 
eliminate extraneous noises generated if tubing contacts the 
stethoscope. 

For research purposes, random-zero sphygmomanome¬ 
ters will reduce but still not eliminate observer bias. Fully 
automatic devices, if otherwise technically accurate, should 
eliminate certain human foibles (eg, end-digit preference 94 
and selective recording of “desirable” readings). Statistical 
monitoring 54 to detect end-digit preference or excessive vari¬ 
ability followed by mandatory retraining should be helpful. 

In practice, the grossest error, not checking BP at all, remains 
a common failing even among cardiovascular subspecialists. 127 
Most measurement errors could be obviated if practitioners 
would only follow the published recommendations 19,128 132 ; alas, 
many do not. 63,96 ' 98 

Other Ways to Measure BP 

If you cannot hear properly, both SBP and DBP can be deter¬ 
mined by palpation to within about 10 mm Hg. 133 Palpated 
SBP is about 7 mm Hg lower than the auscultatory value. 134 

Potential Improvements in the Diagnosis of Hypertension 

Elevated BP during aerobic exercise testing in subjects nor¬ 
motensive at rest has some predictive value for subsequent 
definite hypertension (relative risk from 2.3 to approximately 
7 ). i 23 , 124 ,135 Because BP is so variable during daily activity, 
ambulatory BP monitoring 136 138 ought to provide a better 
estimate of whole-day target organ exposure. Ambulatory BP 
monitoring correlates better with coexisting target organ 
damage 138 and a retrospective follow-up study suggested 
improved prediction for subsequent CVD. 139 However, some 
patients cannot tolerate ambulatory BP monitoring and 
accurate measurements are not always possible (eg, with 
marked arrhythmia or obesity). 140 Although appropriate 
studies have begun, 141 no data yet exist to show that adding 
ambulatory BP monitoring results in clinical benefit, and 
issues such as cost-effectiveness remain. 136 

Self-measurement of BP 142 is also under active study. Con¬ 
current accuracy, 82 the meaning of differences in measure¬ 
ments at home and at work, 143 and concerns about selective 
reporting remain. When patients bring in their own, usually 
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lower, home BP readings, be certain to explain that only anti¬ 
hypertensive treatment of resting BP readings is of proven 
value, that daytime BP is routinely higher than evening BP, 
and that cardiac involvement may relate more closely to work 
time BP. 144 Although the appeal of self-monitoring includes 
potentially desirable psychological and compliance effects, 
any benefit remains questionable 142 ; a 1-year trial of home BP 
monitoring found no difference in treatment, attained BP, or 
risk factor reduction. 145 

THE BOTTOM LINE 

Hypertension remains one of the most prevalent and most 
important public health problems. Measurement of BP has 
won its place in the recommended periodic health examina¬ 
tion because hypertension is common, clinically silent, dan¬ 
gerous, and treatable. Accurately measuring BP by the indirect 
method requires minimal equipment, combined with a will¬ 
ingness to make the effort; all health care practitioners should 
read and follow published guidelines. 18 Attention to proper 
technique plus an appreciation of the inherent variability of BP 
should yield an accurate diagnosis in most patients. Occa¬ 
sional patients with suspected pseudohypertension or white 
coat syndrome may benefit from ancillary technology such as 
echocardiography or ambulatory BP monitoring to optimize 
diagnostic decision making. Conversely, in the far more com¬ 
mon, otherwise low-risk patient, yearly BP readings will suffice 
to rule out the presence of severe or longstanding untreated 
hypertension. The patient in our clinical scenario would be 
served well by a return visit in a few weeks for repeated BP 
measurement, 3 whereas immediately labeling him as hyperten¬ 
sive would be incorrect and, by causing him unnecessary con¬ 
cern, could be an immediate disservice. 

Following expert treatment guidelines constitutes the phy¬ 
sician’s final responsibility, tying a proper diagnosis and 
proven therapy together to benefit the patient. 
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CLINICAL SCENARIO 


A 47-year-old woman with a strong family history of heart 
disease reports that her blood pressure (BP) measured on 
commercial store devices is sometimes, but not always, as 
high as 140 to 145 mm Hg systolic. She feels well and is 
physically active. Your nurse obtained her BP of 147/82 
mm Hg with your office automated oscillometric device. 
Given the values she reports and your office measure, have 
you diagnosed her as having stage 1 hypertension? 

UPDATED SUMMARY ON BP MEASUREMENT TO 
DETECT HYPERTENSION IN ADULTS 

Original Review 

Reeves RA. Does this patient have hypertension? how to 
measure blood pressure. JAMA. 1995;273(15):1211-1218. 

UPDATED LITERATURE SEARCH 

Because we were aware of systematic reviews on BP diagnosis 
and management, our literature search focused on formal 
systematic reviews of adult hypertension published since 
2000. Initially, we crossed the search terms “blood pressure 
determination/methods” and “hypertension/diagnosis” fil¬ 
tered for “human,” English articles that arose from “consen¬ 
sus development conferences” or were “academic reviews.” 
The results yielded 36 titles but contained obvious omissions. 
We next used the SUMSearch strategy (http://sumsearch. 
uthscsa.edu/; accessed May 30, 2008) in PubMed for the 
search term blood pressure, limited to physical examination 
from 2000-2004; this yielded 111 articles and included the 
publications from well-known US, Canadian, European, and 
British consensus groups for the evaluation and management 
of hypertension. Each of the consensus groups used a formal 
systematic approach to evaluate the evidence for BP measure¬ 
ment techniques. The consensus groups all published 
updates during 2003-2004; therefore, we focused our review 
on these 4 groups and the references from those reports that 
specifically addressed BP measurement. The independent 
groups used high methodologic standards, and the recom¬ 
mendations are similar. Therefore, we present the summary 


data from these without individual reviews of each society’s 
recommendations. 


NEW FINDINGS 

• The classification of BP has now been changed to “normal,” 
“prehypertension” or “high normal,” stage 1 hypertension, 
and stage 2 hypertension ( les 23- and 23- ). 

• Aneroid sphygmomanometers are accurate, but only when 
they are calibrated at least once yearly and the examiners 
use proper measurement techniques. Sphygmomanometers 
should be calibrated to their manufacturer’s specifications. 

• Self-measured BP can be used as part of the patient’s medi¬ 
cal history, but the thresholds for diagnosing hypertension 
are lower (> 135 mm Hg systolic or > 85 mm Hg diastolic). 

Details of the Update 

The techniques for measuring BP are presented well in the 
original article and are unchanged. A systematic review of BP 
measurement was presented in a series of articles published 
in the British Medical Journal 1 ; one article highlighted the 
errors in measurement that were also reviewed in The Ratio¬ 
nal Clinical Examination article. 2 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

Self-measurement of BP had not been adequately studied 
when The Rational Clinical Examination article was pub- 


Table 23-4 Classification of Blood Pressure (JNC-VII) 3 

SBP, mm Hg 

DBP, mm Hg 

Normal 

<120 

And 

<80 

Prehypertension 

120-139 

Or 

80-89 

Stage 1 hypertension 

140-159 

Or 

90-99 

Stage 2 hypertension 

>160 

Or 

>100 


Abbreviations: DBP, diastolic blood pressure; JNC-VII, seventh report of the Joint 
National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood 
Pressure; SBP, systolic blood pressure. 
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Table 23-5 Classification of Blood Pressure (Canadian, British, and 
European Societies) 

Classification 

SBP, mm Hg 


DBP, mm Hg 

Optimal 

<120 

And 

<80 

Normal 

<130 

And 

<85 

High normal 

130-139 

Or 

85-89 

Grade 1 hypertension 

140-159 

Or 

90-99 

Grade 2 hypertension 

160-179 

Or 

100-109 

Grade 3 hypertension 

>180 

Or 

>110 


Abbreviations: DBP, diastolic blood pressure; SBP, systolic blood pressure. 


lished. There is now an established role for using properly 
obtained self-measured BP values. As such, the patient’s 
report can be considered part of the medical history when 
assessing for hypertension. 

CHANGES IN THE REFERENCE STANDARD 

Indirect BP measurement through auscultation of Korotkoff 
sounds is the pragmatic reference standard for clinical care 
and research. BP measurement with oscillometric techniques 
is acceptable for following treatment in patients with estab¬ 
lished hypertension, but the initial diagnosis should be con¬ 
firmed with auscultatory methods. This may be especially 
important in elderly patients or those with arrhythmias. The 
seventh report of the Joint National Committee on Preven¬ 
tion, Detection, Evaluation, and Treatment of High Blood 
Pressure (JNC-VII) 3 recommends the classification of BP for 
adults as shown in Table 23-4. 

The Canadian Hypertension Education Program, 4 British 
Hypertension Society, 5 and European Society of Hypertension 6 
avoid labeling patients as “prehypertensive.” Instead, they 
describe “high-normal” BP and recommend that this group 
receive more frequent monitoring, given their higher risk of 
developing hypertension (Table 23-5). 

RESULTS OF LITERATURE REVIEW 

The principles of BP measurement remain the same: 

1. Auscultatory methods with a properly calibrated device 
are the reference standard. 

2. Patients should be seated quietly for 5 minutes in a chair, 
with their feet on the floor and the arm supported at heart 
level. 

3. The cuff’s bladder should encircle at least 80% of the arm. 

4. The systolic pressure is the point at which the first of 2 or 
more consecutive heart sounds is heard (phase 1). The 


diastolic pressure is the point at which the sounds disap¬ 
pear (phase 5). If sounds are heard all the way to 0 mm Hg, 
the point of diastolic muffling (phase 4) is used as the dia¬ 
stolic pressure. 

Several points are worth observing from these recommen¬ 
dations. First, devices should be calibrated at least once a 
year, which is especially important because mercury sphyg¬ 
momanometers are being replaced by aneroid or oscillomet¬ 
ric devices. Aneroid devices use a spring mechanism that is 
subject to wear and can cause inaccurate readings. However, 
when properly maintained, aneroid sphygmomanometers 
are accurate and underestimate reference devices by a mean 
of only 0.5 mm Hg. 7 

Second, once the proper equipment is obtained (ie, a cali¬ 
brated device and an appropriately sized cuff), the exact pro¬ 
cedure must be followed. The 4 points listed above highlight 
important potential technical errors that are avoidable when 
the proper procedure is followed. The patient must be seated 
(rather than supine, which can increase the systolic pressure 
3 mm Hg). The arm must be supported by an armrest or sup¬ 
ported by the examiner so that the patient does not create 
effort in elevating the arm (a potential 2 mm Hg increase in 
pressure). Finally, the arm must be at the level of the heart 
rather than dangling (a potential 10 mm Hg increase in sys¬ 
tolic pressure). 

Third, diagnosis when the BP approaches the threshold 
values should never be based on a single measure. At a single 
visit, at least 2 measures should be taken. Because of biologi¬ 
cal variability and white coat hypertension, patients should 
return to the clinic for additional measures when the diagno¬ 
sis is not obvious. At the initial diagnosis, experts suggest 
measuring the BP in both arms. Patients with large discrep¬ 
ancies (eg, >20 mm Hg systolic or >10 mm Hg diastolic) 
need further assessment. 

Self-monitoring of BP can be used for both diagnosis and 
treatment monitoring. As such, the results should be consid¬ 
ered as part of the patient’s medical history. When used to 
diagnose hypertension, a mean self-recorded BP greater than 
135 mm Hg systolic or 85 mm Hg diastolic should be consid¬ 
ered hypertensive. 8 The patient should use a fully automated 
monitor with an appropriate-sized arm cuff. The physician 
should discard the first day of patient reading and then use 
all other data to calculate the mean BP. 9 Randomized trials of 
home monitoring for BP control used a frequency of twice- 
daily to twice-weekly recordings. 10 

EVIDENCE FROM GUIDELINES 

All guidelines emphize the need for correctly measured BPs. 
The US Preventive Services Task Force recommends that a 
diagnosis be established only after 2 or more elevated mea¬ 
sures on at least 2 visits during at least 1 week. 11 
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Special Populations 

Two populations of adults deserve special mention because 
BP measurement may be misleading. Elderly patients may 
have greater BP variability than younger patients, and they 
may develop decreased arterial compliance that creates the 
phenomenon of pseudohypertension. These patients may 
have an unidentifiable phase V. Because systolic hyperten¬ 
sion is so important and the prevalence of hypertension is 
so high in the elderly, additional testing with self-moni¬ 
tored BP or ambulatory BP measures may be needed. 

Patients with irregular arrhythmias can display large 
beat-to-beat BP liability. Automated devices may be partic¬ 
ularly “confused” by the variability, so all patients with 
arrhythmia should have their BP confirmed with indirect 
auscultation. 


CLINICAL SCENARIO—RESOLUTION 


Using JNC-VII standards, it is highly likely that she at least 
has prehypertension and perhaps stage 1 hypertension. How¬ 
ever, the initial diagnosis should be established with more 
certainty. Your nurse reported a single value obtained with 
your office automated cuff; you should confirm that proper 
techniques were used and whether the measure was repeated. 
Your nurse may have repeated the BP and recorded only the 
lower of 2 values; that would underestimate the BP because 
the mean value should be used. At the initial diagnosis, indi¬ 
rect auscultation is necessary (usually with a calibrated aner¬ 
oid sphygmomanometer). If you are not sure whether your 
office cuffs have been calibrated, assigning one of your office 
staff responsibility for calibration at the manufacturer’s rec¬ 
ommended interval is an important quality measure. You 
should repeat the patient’s BP measurement yourself, making 
sure that you follow the appropriate principles of measure¬ 
ment (5-minute rest, correct cuff size, arm supported, arm at 
the level of the heart). The patient should return in 1 to 2 
weeks for a repeated measurement, or you might opt for self- 
measured home BPs to establish the diagnosis. 


ADULT HYPERTENSION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Approximately 25% of all US adults have hypertension. 
The prevalence is much greater at older ages. The JNC-VII 
review observed that a normotensive 55-year-old individ¬ 
ual has a 90% lifetime risk of developing hypertension. 
More than half of patients aged 60 to 69 years have hyper¬ 
tension, with the estimate increasing to 75% by age 70 
years. 

POPULATION FOR WHOM HYPERTENSION 
SHOULD BE CONSIDERED 

All persons older than 18 years. 

DETECTING THE LIKELIHOOD OF HYPERTENSION 

Indirect auscultation is the reference standard for detecting 
hypertension. Because prehypertension and grade 1 hyper¬ 
tension produce no symptoms, there are no screening tests 
and there is a universal recommendation to evaluate all 
adults for high BP at least every 2 years. Patients with prehy¬ 
pertensive values should be monitored more frequently. 
Self-monitored BPs may be used for diagnosis when the 
patient uses appropriate measurement techniques. 


Despite proper measurement techniques, inaccurate results 
at a single visit may be attributed to biologic variability or white 
coat hypertension. Thus, patients should have values repeated 
at several visits to confirm stage 1 hypertension. Assessment of 
white coat hypertension may use self-monitored BP measure¬ 
ment. In addition, continuous ambulatory BP measurement 
may be obtained as an additional diagnostic test. 

REFERENCE STANDARD TESTS 

Indirect auscultation with a mercury or well-calibrated 
aneroid sphygmomanometer provides the pragmatic refer¬ 
ence standard. Oscillometric measures for diagnosis should 
be confirmed with indirect auscultation. Self-monitored 
BPs may be used, although the threshold should be less 
than 135 mm Hg systolic and less than 85 mm Hg diastolic. 
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SUPPORT THE UPDATE: 


Hypertension 



TITLE The Seventh Report of the Joint National Com¬ 
mittee on Prevention, Detection, Evaluation, and Treat¬ 
ment of High Blood Pressure. 

AUTHORS Chobanian AV, Bakris GL, Black HR, et al. 

CITATION JAMA. 2003;289(19):2560-2572. 

(Note: A complete version of the report appears in Hyper¬ 
tension. 2003;42(6): 1206-1252.) 

QUESTION Is the case definition for blood pressure 
(BP) appropriate? 

DESIGN Formal, systematic review. 

DATA SOURCES MEDLINE. 

STUDY SELECTION AND ASSESSMENT Study 
population consisted of adults. Articles reviewed were pub¬ 
lished between January 1997 and April 2003. Quality sche¬ 
mata used in previous iterations of the Joint National 
Committee on Prevention, Detection, Evaluation and 
Treatment of High Blood Pressure were used. 

MAIN RESULTS 

Patients with BP of 130/80 mm Hg to 139/89 mm Hg are at 
twice the risk of developing hypertension compared with 
those with lower BP. Thus, the patients in this 130 to 139 mm 
Hg systolic range were reclassified to a category called prehy¬ 
pertension. 


Properly calibrated instruments must be used. The method 
can be simplified to a few key points: (1) patients should sit 
quietly for at least 5 minutes before the BP is measured, (2) the 
patients should be seated with their arm supported at heart 
level, (3) a cuff that encircles at least 80% of the arm should be 
used, and (4) 2 measures should be obtained. The systolic BP is 
the point it which the first of 2 or more Korotkoff sounds is 
heard. The diastolic BP is the point at which the Korotkoff 
sounds are last heard. 

The committee supported the use of patient self-measured 
BP, observing that a mean value that is higher than 135/85 
mm Hg should be accepted as defining hypertension. Simi¬ 
larly, patients with home measures consistently less than 130/ 
80 mm Hg and who lack target organ disease, despite 
increased office measures, do not meet criteria for hyperten¬ 
sion. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS A large number of participations in an ongoing 
review of the evidence base for the treatment of hypertension. 

There were no changes in the recommendations for BP 
technique performed in the examination room. The use of 
self-monitored BP as part of the patient’s medical history 
requires that the patient’s monitoring device be accurate and 
that the patient use proper technique. 

Reviewed by David L. Simel, MD, MHS 


E23-1 




This page intentionally left blank 



CHAPTER 


CLINICAL SCENARIOS 


Is This Adult Patient 

Hypovolemic? 

Steven McGee, MD 
William B. Abernethy III, MD 
David L. Simel, MD, MHS 


In each of the following clinical scenarios, clinicians need 
to identify which physical signs reliably and accurately 
indicate volume depletion or dehydration. 

CASE 1 A 54-year-old man, taking ibuprofen for knee 
arthritis, presents with a 1-day history of melena. Physical 
examination reveals a pulse of 80/min and blood pressure 
(BP) of 140/82 mm Hg when supine and 115 and 132/86 
mm Hg when standing. There is mild epigastric tenderness 
and a positive result on a guaiac test for occult blood in the 
stool. The hematocrit level is 39%. 

CASE 2 A 62-year-old woman had 6 months of epi¬ 
sodic vertigo and unilateral hearing loss, attributed to 
Meniere disease. She began hydrochlorothiazide, but 3 
weeks later her dizziness is slightly worse. Her heart 
rate is 80/min and BP is 160/84 mm Hg when supine, 
88 and 134/72 mm Hg when standing. On standing, she 
experiences slight dizziness. 

CASE 3 An 82-year-old nursing home resident presents 
to the emergency department with a 1-day history of nau¬ 
sea and vomiting. Her underlying medical problems 
include dementia, coronary artery disease, atrial fibrilla¬ 
tion, emphysema, and hypertension. She has been treated 
with aspirin, isosorbide dinitrate, furosemide, (3-agonist 
inhalers, and lisinopril. The clinician diagnoses viral gas¬ 
troenteritis or food poisoning because other members of 
the nursing home have an identical illness. On examina¬ 
tion, the patient is afebrile and alert and demonstrates 
normal speech and strength. Her mental status is no dif¬ 
ferent from her baseline. The pulse is 75/min and the BP is 
154/90 mm Hg supine and 90 and 130/76 mm Hg when 
upright. The tongue, mucous membranes, and axillae are 
moist. Results of an examination of the heart, lungs, 
abdomen, and an electrocardiogram are normal. 


WHY IS CLINICAL EXAMINATION IMPORTANT? 


The term volume depletion describes the loss of sodium from 
the extracellular space (intravascular and interstitial fluid) 
that occurs after gastrointestinal (GI) hemorrhage, vomiting, 
diarrhea, and diuresis. In contrast, the term dehydration 
refers to losses of intracellular water that ultimately cause cel¬ 
lular desiccation which elevates the plasma sodium concen¬ 
tration and osmolality. 1 This distinction is important to 
clinicians (patients with volume depletion exhibit prominent 
circulatory instability and should receive 0.9% saline rapidly, 
whereas those with pure dehydration may lack circulatory 
instability and should receive 5% dextrose, usually more 
slowly). Most patients presenting with dehydration, however, 
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also have volume depletion. Moreover, in most clinical stud¬ 
ies of related physical findings, investigators lump the 2 dis¬ 
orders together by using as a combined criterion standard 
either the presence of an elevated serum urea nitrogen/creati- 
nine ratio (a measure of volume depletion) or an elevated 
serum sodium level or osmolality (a measure of dehydra¬ 
tion). We will use the term hypovolemia to collectively refer 
to both conditions. 

GI tract hemorrhage, the prototype of volume depletion, is 
a common and important problem. Hospitalizations for 
upper GI tract hemorrhage occur in 150/100000 population 
per year 2 and are associated with a case fatality rate of 3% to 
10%. 2,3 Hypernatremia, the hallmark of dehydration, affects 
primarily elderly patients with infections and poor access to 
water, accounting for less than 1% of hospital admissions but 
associated with a mortality rate exceeding 40%. 4 ' 5 Risk factors 
for hypovolemia in the elderly include female sex, age older 
than 85 years, having more than 4 chronic medical condi¬ 
tions, taking more than 4 medications, and being confined to 
bed. 6 

Clinical examination attempts to address (1) whether the 
patient’s symptoms are related to hypovolemia and (2) the 
degree of hypovolemia. In case 1, symptoms and laboratory 
data do not gauge the severity of the GI tract hemorrhage. 
For example, the presence of melena has been associated 
with both insignificant (as little as 100 mL of blood loss) 7 
and massive hemorrhage. 8 The admission hematocrit level 
also correlates poorly with the degree of blood loss and 
overall mortality, 310 especially in cases of persistent or 
recurrent bleeding, because a decrease in hematocrit is 
often delayed 24 to 72 hours after hemorrhage. 1113 In one 
large study, 3 however, postural vital signs were a significant 


Table 24-1 Phlebotomy Studies in Normovolemic Individuals 3 

Source, y 

No. of Subjects 

Amount of Blood 
Removed, mL 

Moderate" 

Knopp et al, 24 1980 

44 

500 

Baraff and Schriger, 25 1992 

100 

450 

Witting et al, 26 1994 

292 

450 


44 

450 

Ralston et al, 27 1961 

16 

530-590 

Warren et al, 28 1945 

8 

4.1-8.5 per kg 

Large" 

Knopp et al, 24 1980 

44 

1000 

Shenkin et al, 29 1944 

11 

1029 ±81 

Wallace and Sharpey- 
Schafer, 30 1941 

25 

9-16 per kg 

Skillman et al, 31 1967 

9 

764 ± 93 

Bergenwald et al, 32 1977 

16 

900 

Ralston et al, 27 1961 

3 

920 


“Mean age range of participants in these studies was 25 to 44 years. The exceptions are 
Warren et al, 28 who did not provide age information, and Witting et al, 26 who had 292 sub¬ 
jects who were younger than 65 years and 44 subjects who were older than 65 years. 
“Moderate was defined as 450 to 630 mL; large, 630 to 1150 mL. 


univariate predictor of mortality and complications. How 
accurate are postural vital signs and which component of 
the postural change, pulse or BP, provides more meaningful 
information? 

In case 2, the clinician recognizes that hydrochlorothiazide 
may benefit patients with Meniere disease 14 but also wonders 
if the diuretic caused volume depletion and aggravated her 
dizziness. How significant is the postural decrease in systolic 
BP of 26 mm Hg and the mild postural dizziness? 

Finally, case 3 differs from case 1 in that the fluid losses are 
not directly from the vascular space and that emesis typically 
has only one-third the sodium concentration of serum. How 
reliable are findings of postural vital signs, capillary refill, 
and moist axilla, tongue, and mucous membranes in this 
patient? 

METHODS 

Using the MEDLINE database for articles from January 1966 
to November 1997, an author (S.M.) used 3 search strategies, 
all limited to the English language and to humans 16 years or 
older, to retrieve all relevant publications on the bedside diag¬ 
nosis of hypovolemia. The first strategy used the search terms 
“dehydration/di” or “hypotension, orthostatic” or “tilt table 
test.” The second strategy used “exp dehydration” or “exp 
hypotension, orthostatic” or “exp heart rate” and “exp physical 
examination” or “exp medical history taking” or “exp profes¬ 
sional competence” or “exp ‘sensitivity and specificity” or 
“reproducibility of results” or “observer variation” or “diag¬ 
nostic tests, routine” or “exp decision support techniques” or 
“Bayes theorem.” Finally, textword searches were completed 
for “skin turgor” or “acute blood loss” or “orthostatic vital 
signs or (postural and pulse).” According to review of titles and 
abstracts, relevant publications were retrieved. To complete 
the search, this author reviewed the bibliographies of these 
articles and those of textbooks on physical diagnosis. Studies 
on the physical diagnosis of hypovolemia in infants and chil¬ 
dren were not included in this review. 15 ' 23 

Two types of studies are presented. The first group (Table 
24-1) investigated the postural vital signs and capillary refill 
time in healthy volunteers, some of whom underwent phle¬ 
botomy of up to 1150 mL of blood. Despite their limitations, 
these studies are included because they are the only studies 
that compare physical signs with objective measurements of 
blood loss. A second set (Table 24-2) included patients pre¬ 
senting to emergency departments with suspected hypovole¬ 
mia, usually caused by vomiting, diarrhea, or decreased oral 
intake. Two authors (S.M. and W.B.A.) independently graded 
these studies A, B, or C, according to the criteria that appear 
in the footnote of Table 24-2. There was complete agreement 
regarding classification. 

A random-effects model was used to generate summary mea¬ 
sures and confidence intervals (CIs). 37,38 The model was appro¬ 
priate because the studies of pulse and pressure change in 
normovolemic individuals were representative of all such inves¬ 
tigations and included a broad mix of relevant subjects. For 
studies of diagnostic accuracy, the random-effects summary 
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Table 24-2 Clinical Studies of Hypovolemia 3 





Source, y 

Grade of 
Study 3 

No. of 
Subjects 

Age, Mean 
(Range), y 

Patient Population 

Physical Finding 

Criterion Standard 
for Hypovolemia 

Reason Study 

Not Grade A 

Eaton et al, 33 
1994 

A 

86 

80 (70-98) 

Patients older than 70 y 
admitted with acute med¬ 
ical conditions 

Dry axilla 

Serum urea nitrogen/creati¬ 
nine ratio > 25 or plasma 
osmolarity > 295 mmol/kg 

H 2 0 


Gross et al, 34 
1992 

B 

38 

82 b (61 -98) 

Stable patients older than 
60 y in the emergency 
department with sus¬ 
pected fluid and electro¬ 
lyte problems 

Dry mucous mem¬ 
branes, dry tongue, 
tongue furrows, con¬ 
fusion, weakness, 
nonfluent speech, 
sunken eyes 

Elevated serum urea nitrogen/ 
creatinine ratio, serum osmo¬ 
lality, or serum sodium 

n < 50 

Johnson et al, 35 
1995 

C 

23 

NA (18-31) 

Pregnant women in the 
emergency department 
with hyperemesis gravi¬ 
darum and normal serum 
electrolyte and creatinine 
levels 

Postural vital signs 

>5% Weight gain after rehy¬ 
dration 

Convenience sample 

Schriger and 
Baraff, 36 1991 

C 

32 

44 (17-90) 

Adults in the emergency 
department with 
decreased oral intake, 
vomiting, diarrhea, or 
blood loss, 3 and sus¬ 
pected hypovolemia 

Capillary refill time 

Hypotension or postural pulse 
increment > 20 beats/min or 
diastolic blood pressure dec¬ 
rement > 15 mm Hg 

Criterion standard was 
postural vital signs or 
hypotension; question 
blinding from criterion 
standard 


Abbreviation: NA, not available. 

“Grading was determined by these traits: A, an independent, blind comparison of a defined physical sign with an acceptable criterion standard of hypovolemia in more than 50 consec¬ 
utive patients suspected of having hypovolemia; B, same traits as A but there were fewer than 50 consecutive patients suspected of having volume depletion; C, all other studies, 
including those using a criterion standard of uncertain validity, a physical finding not clearly defined, a comparison that was not blinded, or a selection of patients dependent on either 
the physical finding or criterion standard. An acceptable criterion standard was a chemical measure (either elevated serum sodium, osmolality, or blood urea nitrogen ratio, or blood 
urea nitrogen/creatinine ratio) or percentage of weight gain after the patient had received parenteral fluids. (See Table 1 -7 for a summary of Evidence Grades and Levels.) 

“Median age instead of mean age. 

“Total number of subjects with blood loss equaled 6. 


measures provided suitable benchmarks for clinicians’ use in 
actual practice and avoided errors when testing for homogeneity 
among a number of investigations. Calculations of sensitivity 
and specificity were derived from graphs or tabulated data that 
appeared in the original articles or were available from the 
authors of the studies. 26,34,35 Those phlebotomy studies that 
described their results only as mean and standard deviations of 
the postural change in heart rate and BP, before and after phle¬ 
botomy, were reviewed but excluded from the calculations of 
sensitivity and specificity. 39 ' 42 We used the method of Simel et al 43 
to calculate CIs for the likelihood ratios (LRs). 

RESULTS 

Postural Vital Signs 

When obtaining postural vital signs, clinicians should wait 2 
minutes before measuring the supine vital signs and 1 
minute after patients stand before measuring the upright 
vital signs, according to investigations of healthy individuals 
discussed below. Having the patient sitting instead of stand¬ 
ing markedly reduces the clinician’s ability to detect the pos¬ 
tural changes induced by blood loss. 24 Clinicians who count 
the pulse for 30 seconds and double the result are more accu¬ 
rate than those using only 15 seconds. 44 


Within 1 to 2 minutes after the patient stands up from the 
supine position, about 7 to 8 mL/kg of blood shifts to the lower 
body, causing the thoracic blood volume, stroke volume, and 
cardiac output to decrease and circulating norepinephrine 
levels and systemic vascular resistance to increase. 40,41,45 ' 48 
Table 24-3 presents data from 25 studies that investigated the 
postural vital signs of more than 3500 normovolemic individu¬ 
als during tilt tests (moving from supine to upright positions by 
active standing was used in 97% of patients, tilt table testing in 
3% of patients). The most prominent finding is an increment 
in heart rate of 11/min (95% Cl, 8.9-13/min). This increase 
usually stabilizes after 45 to 60 seconds with the patient in the 

upright position. 24,45,52,54 The systolic BP decreases slightly by 3.5 
mm Hg (95% Cl, -1.5 to -5.5 mm Hg), stabilizing 1 to 2 min¬ 

utes after standing, 45,54 whereas the diastolic BP increases by 5.2 
mm Hg (95% Cl, 2.8-7.6 mm Hg). 

The variability of the postural pulse increment observed in 
these studies is in part attributable to the patients’ ages and per¬ 
haps to the physical examination method. In Table 24-3, the 
mean age from each study correlates inversely with the 
observed mean pulse increment (r, -0.50; P = .02) (Table 
24-3). Other studies 47,56,57,61,62 also confirm that as patients age, 
the pulse increment becomes smaller, although no obvious cut 
point exists that allows the clinician to stratify patients. The 
duration of supine rest before the patient stands might also 
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Table 24-3 Postural Change in Vital Signs of Normovolemic Adults 3 


Source, y 

No. of Subjects 

Age, Mean (Range), y 

Pulse Change, 
beats/min (SD)“ 

Systolic Blood 
Pressure Change, 
mm Hg (SD) 

Diastolic Blood 
Pressure Change, 
mm Hg (SD) 

Tell et al, 49 1988 

916 

... b (14-16) 

+14.0(14.8) 

-5.2 (8.6) 

+10.2(12.4) 

Honda et al, 50 1977 

496 

16.5 (...) 

+8.3 (8.8) 

-9.5 (9.4) 

+3.7(11.9) 

Horam and Roscelli, 51 1992 

34 

... (16-19) 

+18.4(8.4) 

+0.8 (7.4) 

+8.3 (6.7) 

Borst et al, 52 1982 

10 

21 (...) 

+15(13) 

+2(5) 

+19(8) 

Kaijser and Sachs, 53 1985 

14 

... (20-26) 

+16(8) 

-1(6) 

+4 (5) 

Moore and Newton, 54 1986 

50 

... (25-35) 

+12.6(11.7) 

-12.1 (7.4) 

-3.5 (6.1) 

Baraff and Schriger, 25 1992 

104 

32 (...) 

+2(7) 

-2(6) 

+4 (7) 

Green and Metheny, 39 1947 

25 

32 (18-46) 

+9(7) 

-5(8.1) 

+12(9.3) 

Currens, 55 1948 

1000 

33.2 (...) 

+13.2 (...) 



Koziol-McLain et al, 56 1991 

132 

34.1 (...) 

+17.2(11.1) 

+2.8(11.4) 

+9.2 (7.8) 

Knopp et al, 24 1980 

79 

36 (17-55) 

+18.4 (...) 

-2-8 (...) 

+16.4 (...) 

Tuckman and Shillingford, 48 1966 

9 

37 (...) 

+13(12) 

+1 (8) 

+7 (7) 

Streeten et al, 46 1988 

92 

... (18-64) 

+12.3(4.8) 

-6.5 (4.8) 

+5.6 (3.8) 

Wong et al, 40 1989 

27 

41.4 (...) 

+14.6(5.7) 



Kaijser and Sachs, 53 1985 

18 

42.5 (38-47) 

+13(8) 

-2(8) 

+6 (8) 

Dambrink and Wieling, 57 1987 

10 

... (60-69) 

+10(9.5) 

-2 (12.6) 

+9 (9.5) 

Kaijser and Sachs, 53 1985 

15 

67 (...) 

+11(8) 

+9 (20) 

+3 (13) 

Dambrink and Wieling, 57 1987 

10 

... (70-79) 

+11 (6.3) 

-9(12.6) 

+2 (3.2) 

Baraff and Schriger, 25 1992 

96 

76.5 (...) 

+1(7) 

-5(12) 

-2 (7) 

Green and Metheny, 39 1947 

13 

o 

CO 

+2 (5.5) 

-9(12) 

-5 (7.7) 

Dambrink and Wieling, 57 1987 

10 

... (80-89) 

+8 (3.2) 

-5 (9.5) 

+4 (6.3) 

Lipsitz et al, 58 1 985 

15 

87 (...) 

+12(7.5) 

-3(16) 


Levitt et al, 59 1992 

21 


+6.8 (7.8) 

-2.5 (8.0) 

+5.3 (9.9) 

Schneider and Truesdell, 60 1922 

144 


+13.8(7.1) 



Schneider and Truesdell, 60 1922 

204 


+12.5(8.5) 

+5.3 (...) 


Summary measure 3 

NA 

NA 

+11 (8.9-12.8) 

-3.5 (-1.5,-5.5) 

+5.2 (2.8-7.6) 


Abbreviations: Cl, confidence interval; NA, not available. 
“Values are expressed as upright minus resting supine value. 
“Ellipses indicate data not available. 

“Expressed as random effects summary measure (95% Cl). 


affect the variability of the pulse change, according to the one 
outlier study in Table 24-3, 25 which demonstrated a mean pos¬ 
tural pulse increment of only 2/min and used the shortest time 
of supine rest before having the patient stand (only 1 minute; all 
other studies waited at least 2 minutes). Longer periods of 
supine rest before standing may produce a greater immediate 
pulse increment, perhaps by causing a greater transfer of blood 
to the legs and decrement in cardiac output. 48,63 Aside from the 
patient’s age and period of supine rest, however, no other trend 
was evident. There was no clear relationship between the pos¬ 
tural pulse increment and period of supine rest beyond 2 min¬ 
utes, resting supine pulse rate, time upright before vital signs 
measurement (all > 45 seconds), technique of pulse measure¬ 
ment (palpation vs automated), setting of the study (emergency 
department, prephlebotomy vs other), or method of assuming 
the upright posture (active stand vs tilt table). 

According to the studies in Table 24-3 that enrolled more 
than 25 individuals and presented tabulated data (n = 774), 
the specificity of a postural pulse increment of 30/min or 


more (ie, the most common threshold used in clinical stud¬ 
ies) was 96% (95% Cl, 92%-98%). 

Postural hypotension, defined as a decrement in systolic BP of 
more than 20 mm Hg after standing from the supine position, 

occurs in up to 10% of normovolemic individuals younger than 

65 years 26 and in 11% to 30% of those who are older than 65 

years. 64 ' 71 Postural hypotension is more likely if the patient has 
supine systolic hypertension 58,67,68,71 ' 73 but is not more likely, sur¬ 
prisingly, if the patient takes cardiovascular or psychotropic 
medications. 47,65,68,71,74 Finally, the symptom of mild or moderate 
postural dizziness is a poor predictor of postural hypotension in 
most studies. 56,67,70 

Pathogenesis and Definition of Other Physical Findings 

The capillary refill time is determined by compressing the distal 
phalanx of the patient’s middle finger, positioned level with the 
heart, for 5 seconds and then timing the return of normal color to 
the finger. With an ambient temperature of 21°C, the upper limits 
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of normal for the refill time are 2 seconds for children and adult 
men, 3 seconds for adult women, and 4 seconds for the elderly. 75 

Poor skin turgor refers to the slow return of skin to its normal 
position after being pinched between the examiner’s thumb and 
forefinger. 76 The protein elastin, which is responsible for the recoil 
of skin, is markedly affected by moisture content. As little as 3.4% 
loss in wet weight may prolong the recoil time 40-fold. 76 Elastin 
deteriorates with age, suggesting that the recoil of skin normally 
decreases with age, although this has never been formally studied 
to the authors’ knowledge. No studies on the normal recoil time 
or precise definitions of technique could be found. 

Cellular dehydration, interstitial space dehydration, and poor 
perfusion are presumably responsible for many of the other clas¬ 
sic signs of hypovolemia, such as longitudinal tongue furrows, 
dry mucous membranes, dry axillae, and sunken eyes. No studies 
on the pathogenesis of these findings, however, could be found. 

Precision of Physical Signs 

Reproducible measurements of BP depend on many vari¬ 
ables, including the examiner’s technique, the patient exam¬ 
ined, and various observer biases and errors, all of which are 
thoroughly reviewed in another article. 77 

Outside of an extensive literature devoted to patients with 
syncope that uses different methods and end points than those 


discussed in this article, the few studies of tilt test reproducibility 
focus more on biological variation (ie, reproducibility of the test 
when repeated days later) than on immediate interobserver 
reproducibility. When postural vital signs of 911 elderly nursing 
home residents were measuered 4 times daily, postural hypoten¬ 
sion was present only 1 of the 4 times in 18% of the residents, 2 
or 3 times in 20%, and all 4 times in only 13%. 68 Postural 
hypotension is more reproducible in the morning than in the 
afternoon 68 ' 78 or if cardiovascular medications are withheld. Car¬ 
diovascular medications can unmask supine systolic hyperten¬ 
sion, a known risk factor for postural hypotension. 58 ' 73 

In acutely ill elderly patients, interobserver agreement for 
axillary sweating (dry vs moist) was moderate (k, 0.50; 80% 
simple agreement). 33 In addition, the clinician’s assessment of 
axillary moisture correlated well with the weight gain of a 
piece of preweighed tissue paper applied to the patient’s axilla 
for 15 minutes. 33 With stopwatches, the measurements of cap¬ 
illary refill time by 2 observers were within 0 to 0.3 seconds of 
each other. 75 

Accuracy of Physical Signs for Acute Blood Loss 

Table 24-4 reveals that the 2 most valuable observations 
from the tilt test are either a postural pulse increment of 
30/min or more or the inability of the patient to stand for 


Table 24-4 Diagnostic Accuracy of Vital Signs for Acute Blood Loss 




Finding 

Source, y 

Moderate Blood Loss, 
Sensitivity (95% Cl), %“ 

Large Blood Loss, 
Sensitivity (95% Cl), % b 

Before Blood Loss, 
Specificity (95% Cl), % 

Postural pulse increment 

Knopp etal, 24 1980 

57 

98 

98 

> 30/min or severe postural 
dizziness" 

Shenkin etal, 29 1944 

d 

100 


Baraff and Schriger, 25 1992 

8 


100 


Witting et al, 26 1994 

14 


99 


Summary measure" 

22 (6-48) 

97 (91-100) 

98 (97-99) 

Postural hypotension 

Baraff and Schriger, 25 1992 

7 


98 

(>20 mm Hg decrease in SBP) c ’ f 

Witting et al, 26 1994 

9 


90 


Summary measure 6 

9(6-12) 


94 (84-99) 

Age > 65 y 

Witting et al, 26 1994 

27 (14-40) 


86 (76-97) 

Supine tachycardia (pulse 

Ralston etal, 27 1961 

0 

0 

100 

> 100/min) 

Shenkin etal, 29 1944 


9 

91 


Wallace and Sharpey-Schafer, 30 1941 


16 

100 


Skillman et al, 31 1967 


0 

100 


Summary measure 6 

0 (0-42) 

12(5-24) 

96 (88-99) 

Supine hypotension 

Warren etal, 28 1945 

13 


100 

(SBP < 95 mm Hg) 

Shenkin etal, 29 1944 


36 

100 


Wallace and Sharpey-Schafer, 30 1941 


32 

96 


Skillman et al, 31 1967 


56 

100 


Bergenwald et al, 32 1977 


13 



Summary measure 6 

13(0-50) 

33 (21-47) 

97 (90-100) 


Abbreviations: Cl, confidence interval; SBP, systolic blood pressure. 
“Moderate blood loss, 450 to 630 mL. 

“Large blood loss, 630 to 1150 mL. 

““Postural” indicates change from supine to standing position. 
“Ellipses indicate data not availble. 

“Summary measures calculated with random-effects model. 

'Excludes those patients unable to stand because of severe dizziness. 
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vital signs because of severe dizziness. After blood loss of 
450 to 630 mL, only 1 in 5 patients demonstrates these find¬ 
ings. The sensitivity increases to 97% (95% Cl, 91%-100%) 
after 630- to 1150-mL blood loss. The specificity is 98% 
(95% Cl, 97%-99%), a value similar to that generated from 
the studies in Table 24-3. Either of these findings is durable 
after hemorrhage, lasting at least 12 to 72 hours if intrave¬ 
nous fluids are withheld. 11,30,39 If the patient sits instead of 
stands from the supine position, the sensitivity decreases, 
being 39% 24 and 78% 30 in 2 studies after 1000 mL of hemor¬ 
rhage. Because the studies on large blood loss (630-1150 
mL) enrolled younger healthy individuals, the sensitivity 
may also be lower in elderly patients or those taking medica¬ 
tions such as (3-blockers. A patient’s complaint of postural 
dizziness, not severe enough to prevent standing and 
accompanied by a pulse increment lower than 30/min, has 
little predictive value. 26,56 

After excluding those unable to stand to have vital signs meas¬ 
ured, postural hypotension (a more than 20 mm Hg decrease in 
systolic BP) has little additional predictive value. Its sensitivity 
for 450 to 630 mL of blood loss is only 9% in those younger than 
65 years and 27% in those older than 65 years. These numbers 
are similar to the false-positive rates in some studies of the same 
age groups, 10% (<65 years) 26 and 28% (>65 years), 71 resulting 
in positive LRs (LR+) close to unity. There are insufficient data 
to address the value of isolated postural hypotension after 630 to 
1150 mL of blood loss. 

Supine tachycardia (pulse > 100/min) is a specific but 
insensitive indicator of blood loss (specificity, 96%). Thus, 
patients without supine tachycardia can still have significant 


blood loss. In contrast, bradycardia occurs frequently after 
significant blood loss, often immediately preceding the 
decrease in systemic vascular resistance and the fainting that 
may occur. 11,27 ' 32,39,79 ' 81 One study 80 showed a strong correlation 
between the decrease in heart rate after blood loss and the 
maximal decrease in BP (r = 0.79), and, in hypotensive 
patients receiving fluid resuscitation, the pulse may paradox¬ 
ically increase initially. 81 

In patients with suspected blood loss, supine hypotension 
(systolic BP < 95 mm Hg) is a specific finding of hypovole¬ 
mia (specificity, 97%), although it is insensitive to both mod¬ 
erate blood loss of 450 to 630 mL (sensitivity, 13%) and more 
significant loss of 630 to 1150 mL (sensitivity, 33%). 

Using the age- and sex-specific upper limits of normal for 
capillary refill time defined earlier, a prolonged refill time 
does not accurately predict 450 mL of blood loss (sensitivity 
6%; specificity, 93%) and yields an LR+ of 1.0. 36 If the clini¬ 
cian instead uses an arbitrary upper limit of 2 seconds, diag¬ 
nostic performance is no better (sensitivity, 11%; specificity, 
89%; LR+, 1.0). 36 

Accuracy of Physical Signs for 
Other Causes of Hypovolemia 

Table 24-5 reviews the sensitivity and specificity of various 
physical signs for the diagnosis of hypovolemia derived from 
studies of individuals usually presenting to emergency depart¬ 
ments with vomiting, decreased oral intake, or diarrhea. Except 
for 1 study, 35 which enrolled young women with hyperemesis 
gravidarum, these studies generahy recruited older adults. 


Table 24-5 Diagnostic Accuracy of Physical Signs for Hypovolemia Not Due to Blood Loss 


Physical Finding 

Source, y 

Grade of 
Study 3 

Definition of 
Abnormal Finding 

Sensitivity, 

% 

Specificity, 

% 

LR+ (95% Cl) 

LR- (95% Cl) 

Postural vital signs 

Johnson et al, 35 1995 

C 

Pulse increment > 30 
beats/m in 

43 

75 

1.7 (0.7-4.0) 

0.8 (0.5-1.3) 


Johnson et al, 35 1995 

C 

Postural hypotension (SBP 
decline > 20 mm Hg) 

29 

81 

1.5 (0.5-4.6) 

0.9 (0.6-1.3) 

Skin, eyes, and 

Eaton et al, 33 1994 

A 

Dry axilla 

50 

82 

2.8. (1.4-5.4) 

0.6 (0.4-1.0) 

mucous mem¬ 
branes 

Gross et al, 34 1992 

B 

Mucous membranes of 
mouth and nose dry 

85 

58 

2.0(1.0-4.0) 

0.3(04-0.6) 


Gross et al, 34 1992 

B 

Tongue dry 

59 

73 

2.1 (0.8-5.8) 

0.6 (0.3-1.0) 


Gross et al, 34 1992 

B 

Longitudinal furrows on 
tongue 

85 

58 

2.0(1.0-4.0) 

0.3(04-0.6) 


Gross et al, 34 1992 

B 

Sunken eyes 

62 

82 

3.4 (1.0-12) 

0.5 (0.3-0.7) 

Neurologic findings 

Gross et al, 34 1992 

B 

Confusion present 

57 

73 

2.1 (0.8-5.7) 

0.6(0.44.0) 


Gross et al, 34 1992 

B 

Upper or lower extremity 
weakness present 

43 

82 

2.3 (0.6-8.6) 

0.7 (0.5-1.0) 


Gross et al, 34 1992 

B 

Speech not clear or 
expressive 

56 

82 

3.1 (0.9-11) 

0.5 (0.4-0.8) 

Capillary refill time 

Schriger and Baraff, 36 1991 

C 

Capillary refill time 
greater than age- and 
sex-specific upper nor¬ 
mal limit (see “Results”) 

34 

95 

6.9(3.2-15) 

0.7 (0.5-0.9) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; SBP, systolic blood pressure. 

“See Table 24-2 footnotes for grading determinations. See also Table 1 -7 for a summary of Evidence Grades and their relationship to Evidence Levels. 
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A dry axilla increases the probability of hypovolemia (LR+, 
2.8; 95% Cl, 1.4-5.4), although it is an insensitive physical 
sign (sensitivity, 50%). 33 A moist axilla decreases the proba¬ 
bility of volume depletion only slightly (negative [LR-], 0.6; 
95% Cl, 0.4-1.0). 

In the study by Johnson et al 35 of 23 women with hyper¬ 
emesis gravidarum, neither postural hypotension nor a 
postural pulse increment of more than 30/min was partic¬ 
ularly helpful (Table 24-5). In this study, however, the spec¬ 
ificity of a pulse increment of more than 30/min was 
unusually low (75%). One reason for this could be the 
authors’ definition of dehydration (>5% weight gain after 
12 hours of rehydration), which led them to classify as 
nondiseased the dehydrated women with less than 5% 
weight gain, thus devaluing the specificity calculation. 
Alternatively, the postural pulse increment may be less spe¬ 
cific because of pregnancy. 

In another study of 202 individuals with acute illnesses, 
investigators used multiple analysis of variance to identify 
which clinical findings best explained the variation in total 
body water deficit, as calculated from the patient’s serum 
osmolality. 59 The finding of a dry axilla was significantly asso¬ 
ciated with the severity of dehydration (P = .03). The pos¬ 
tural pulse increment was also significantly associated but 
only weakly so (r = 0.22; P = .02 ). 59 The mean water deficit in 
this study was only 3.9%, correlating with a 140-mL deficit 
from the vascular space (or about 250 mL of blood), a level 
below that in the phlebotomy studies discussed earlier. This 
study found no association between dehydration and pos¬ 
tural changes of systolic BR 

In Table 24-5, the capillary refill time seems to perform 
impressively, especially when the capillary refill time is pro¬ 
longed (LR+, 6.9). 36 However, the criterion standard in this 
study was the supine and postural vital signs, raising the 
question whether capillary refill time has any incremental 
diagnostic value. Another study found no correlation 
between capillary refill time, tested over the patella, and 
objective measures of hypovolemia. 34 

In a study of 55 elderly patients presenting with sus¬ 
pected hypovolemia, the 7 physical signs of confusion, 
extremity weakness, nonfluent speech, dry mucous mem¬ 
branes, dry tongue, furrowed tongue, and sunken eyes 
correlated best with measurement of the serum sodium 
and serum urea nitrogen/creatinine ratio (Table 24-5). 34 
According to the CIs of the LRs, however, none of these 
findings is particularly helpful when present in isolation. 
Combinations of findings may be more helpful—on aver¬ 
age, patients with severe and moderate hypovolemia had 
5.7 and 3.9, respectively, of these 7 signs, whereas those 
without dehydration had only 1.3—but this requires vali¬ 
dation. 34 The most helpful negative findings, arguing 
against hypovolemia, are moist mucous membranes, 
absence of sunken eyes, and absence of furrows on the 
tongue. 

Another study found no correlation between degree of 
hypovolemia and dryness of mucous membranes. 59 In adults, 
2 studies have found poor skin turgor to have no diagnostic 

value. 34,59 


THE BOTTOM LINE 

When obtaining postural vital signs, clinicians should wait at 
least 2 minutes before measuring the supine vital signs and 1 
minute after the patient stands before measuring the upright 
vital signs. Counting the pulse for 30 seconds and doubling 
the result is more accurate than 15 seconds of observation. 44 
In normovolemic individuals, a postural pulse increment of 
more than 30/min is uncommon, affecting only about 2% to 
4% of individuals. 

When patients with suspected blood loss are evaluated, the 
most helpful physical findings are severe postural dizziness 
(preventing measurement of upright vital signs) or a postural 
pulse increment of 30/min or more. Having the patient sit 
instead of stand reduces the sensitivity of the tilt test. After 
excluding those unable to stand, postural hypotension has no 
incremental diagnostic value. 

Supine hypotension and tachycardia are frequently absent, 
even with more than 1000 mL of blood loss, and the symp¬ 
tom of mild postural dizziness has no proven diagnostic 
value. Bradycardia is common after significant blood loss. 

Rigorous conclusions about the role of physical examina¬ 
tion for assessing the volume and hydration status of patients 
with vomiting, diarrhea, or decreased oral intake are difficult 
to make because there are few relevant studies. Severe pos¬ 
tural dizziness or a postural pulse increment of 30/min or 
more should be just as accurate as after blood loss, although 
one study 35 of the pulse increment in patients with hyper¬ 
emesis gravidarum failed to confirm this. A dry axilla sup¬ 
ports the diagnosis of hypovolemia in the elderly, and moist 
mucous membranes and a tongue without furrows argue 
against it. However, clinicians should recall that the criterion 
standard of hypovolemia in these studies—simple serum 
chemistry measurements—is easily accessible to clinicians. 

Case 1 demonstrates a postural pulse increment of more than 
30/min, suggesting significant blood loss. The clinician should 
start fluid resuscitation. In case 2, postural hypotension and 
mild postural dizziness lack the specificity necessary to con¬ 
demn the diuretic at this time. The clinician could continue the 
diuretic treatment if the physician believes the patient’s dizziness 
comes from inner-ear vertigo. Finally, despite the negative phys¬ 
ical examination findings in case 3, this patient has many risk 
factors for significant hypovolemia, and the clinician should 
measure the serum blood urea nitrogen, creatinine, and electro¬ 
lyte levels before making the decision to discharge the patient. 
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CLINICAL SCENARIO 


A 75-year-old man fell at home and was on the floor for 8 
hours, unable to ambulate. When his family checked on 
him, they brought him immediately to the emergency 
department. You suspect a hip fracture, but you are also 
concerned about intravascular volume depletion and 
rhabdomyolysis. Although he cannot stand up, he can 
change position from lying down to about a 45-degree 
angle before the hip begins to hurt. His pulse increases 
22/min when he sits at the angle. 

UPDATED SUMMARY ON HYPOVOLEMIA 

Original Review 

McGee S, Abernethy WB 3rd, Simel DL. Is this patient hypo¬ 
volemic? JAMA. 1999;281(11): 1022-1029. 

UPDATED LITERATURE SEARCH 

Our literature search replicated that done in the original 
publication. We used the parent search strategy for The 
Rational Clinical Examination series and combined it with 
“dehydration/di,” “exp hypotension,” “tilt-table test.mp,” 
and “exp hypovolemia.” We also searched on the text words 
“orthostatic vital,” “orthostatic pulse,” “postural pulse,” and 
“postural vital.” This strategy yielded 258 English-language 
articles published between 1998 and September 2004. We 
excluded case reports and then reviewed the title to identify 
potentially eligible articles. The focus was on adults with 
acute hypovolemia, rather than chronic orthostatic hypoten¬ 
sion, using the clinical evaluation or commonly available 
bedside tests. We identified 23 articles for review, but only 
3 contained prospectively collected data applicable to the 
clinical scenario of acute volume depletion in adults. The 
reference list for each article was reviewed but yielded no 
additional studies. To validate the literature search, we also 
used the SUMSearch strategy (http://sumsearch.uthscsa.edu; 
accessed May 31, 2008) in Pub Med for the same search, 
limited to physical examination since 1997; we found no 
additional articles for review. 


NEW FINDINGS 

• A pulse change of 30/min on going from supine to stand¬ 
ing remains the most helpful physical finding. A change of 
only 20/min should be used for the change from sitting to 
standing. 

• Individual clinical findings are not useful in the intensive 
care unit (ICU), but combinations of findings may be 
helpful. 

• In healthy young patients, one study suggests that the bed¬ 
side specific gravity cutoff of 1.020 identifies patients at 
higher and lower risk of dehydration. 

Details of the Update 

Although there have been many studies on orthostatic 
hypotension (especially in the elderly), the focus of this 
review was orthostatic hypotension secondary to hypovo¬ 
lemia. Thus, the results apply only to patients for whom 
there is a suspicion of intravascular volume depletion. 
Examples of clinical conditions would be acute blood loss, 
gastrointestinal illness with fluid loss, decreased oral 
intake, or “unmeasured” losses as might occur with heat- 
induced illness. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

No additional data were found that modify the original 
results. 

CHANGES IN THE REFERENCE STANDARD 

There have been no changes in the reference standard. One 
new study used radiolabeling to measure circulating blood 
volume. A second study from a metabolic laboratory used 
radiolabeling to quantify changes in total body water and 
extracellular water. Although hypovolemia from blood loss 
can be established clinically and with laboratory tests, a prag¬ 
matic clinical reference standard continues to be a problem 
for both clinical work and research studies of other types of 
hypovolemia. Most clinicians would accept a combination of 
laboratory findings and the response to rehydration as the 
reference standard in typical clinical settings. 
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RESULTS OF LITERATURE REVIEW 

One study assessed a variety of clinical variables for detecting 
hypovolemia in the ICU patient. 1 The variables are all readily 
obtainable. However, these findings taken individually were 
essentially useless, as exhibited by likelihood confidence 
intervals that included 1. When used in combination, the 
most important variables were the assessments of third spac¬ 
ing (ie, ascites or pleural effusion), a clinical diagnosis of 
heart failure, and pulmonary edema ( Tables 24-6 and 24-7). 

For patients who are not so acutely ill that they must be 
supine, physicians (or nurses) might often obtain sitting and 
standing vital signs rather than supine and standing. A study 
of sitting-to-standing orthostatic changes was done on 
patients in the emergency department who did not have an 
acute illness that would have affected orthostatic vital signs. 2 
This study found that a change in pulse of greater than or 
equal to 20/min should be the cut point for sitting to stand¬ 
ing changes. This recommendation has face validity but 
should preferably be validated in patients with a suspicion 
for hypovolemia. 

A study of controlled dehydration in collegiate wrestlers 
assessed the role of the urine specific gravity level determined 
with a urine dipstick. 3 The measurement of specific gravity in 
the correct setting on the appropriate patient may have 
merit. In young, healthy subjects for whom there is a suspi- 


Table 24-6 Increases the Likelihood of Hypovolemia in Intensive Care 
Unit Patients 

1. Presence of obvious fluid losses as occurs through drainage tubes 

2. Fluid balance from input and output sheets 


Table 24-7 Decreases the Likelihood of Hypovolemia in Intensive Care 
Unit Patients 

1. Peripheral edema 

2. Pulmonary edema 

3. Third spacing 

4. Skin mottling 

5. Clinically evident heart failure 


cion of hypovolemia not caused by blood loss, a specific 
gravity threshold of 1.020 might be useful for both ruling in 
and ruling out intravascular volume depletion. 

Multivariate Findings for Hypovolemia 

Although a quantitative predictive model has been developed 
that uses clinical features, it was developed and validated 
only for ICU patients for whom the diagnosis of hypovole¬ 
mia was uncertain. Because of that, we cannot assess the gen- 
eralizability of these features, especially because most of the 
features apply only to patients who have been in the ICU for 
several days. Until the results are confirmed, clinicians might 
want to collect these variables and assess their importance 
more qualitatively. 

EVIDENCE FROM GUIDELINES 

No guidelines apply to the assessment of intravascular vol¬ 
ume depletion in adults. 


CLINICAL SCENARIO—RESOLUTION 


From the clinical history, the likelihood of intravascular 
volume depletion seems high. The patient has had no 
oral intake for 8 hours. In addition, he may have hemor¬ 
rhage from his hip fracture and resulting intravascular 
blood loss. The increase in pulse of more than 20/min 
from lying down to sitting supports the diagnosis, but it 
could also be from pain on movement of the hip. 
Although a change in postural tachycardia would be 
helpful to assess intravascular volume depletion, it is not 
necessary to measure this because you have enough evi¬ 
dence to obtain laboratory tests for assessing the effect 
of intravascular volume depletion. Furthermore, this 
patient could be considered a “trauma” patient and the 
presence of tachycardia in blood loss is not universal. A 
urinalysis would likely be obtained (for assessing rhab- 
domyolysis), but the urine specific gravity in this older 
patient may be a marker of his renal function rather than 
his intravascular volume. 
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ADULT HYPOVOLEMIA— MAKE THE DIAGNOSIS 


PHIUH HKUtSAtSILI 1Y Table 24-9 Detecting the Likelihood of Hypovolemia Caused 

Hypovolemia occurs for a variety of causes. There are no by Blood Loss 

reasonable estimates for the prior probability that would be n , , 

.. *2 T, , . , Pulse Increment 30/min Sensitivity, % Specificity, % 

uniformly helpful. Clinicians should use their best judgment 0 r Postural Dizziness 3 (95% Cl) (95% Cl) 

tion according to the patient’s medical history and findings Moderate blood loss 22 (6-48) 98 (97-99) 

that suggest the possibility of fluid losses. (450-630 mL) 

Larger blood loss 97 (91 -100) 

POPULATION FOR WHOM HYPOVOLEMIA t 630 1150 mL > 

DISEASE SHOULD BE CONSIDERED Abbreviation: Cl, confidence interval. 

• Acute blood loss “Based on phlebotomy studies in normovolemic individuals. Specificity is based on 

results for these normovolemic adults before phlebotomy. 

• Illness with fluid loss 

• Decreased oral intake REFERENCE STANDARD TESTS 

• “Unmeasured” losses as might occur with heat-induced Intravascular volume depletion typically relies on a clinical 

illness diagnosis, with appropriate laboratory measures that correct 

See Tables 24-8 and 24-9 for the likelihood of hypovole- with rehydration. In controlled settings, blood volume and 

mia caused by blood loss total body water can be measured indirectly with radiolabeled 

agents. 

Table 24-8 Detecting the Likelihood of Hypovolemia Not Caused by 

Blood Loss 

Finding, Patient Population LR+ (95% Cl) LR- (95% Cl) 

Urine specific gravity > 1.020 11(3-43) 0.09(0.03-0.36) 

• Young, healthy college wrestlers 

• Dehydration secondary to sweating 

Dry axilla 2.8(1.4-5.4) 0.6 (0.4-1.0) 

Patients > 70 y with acute illness 

Pulse increment of > 30/min (supine to 17(0.7-4.0) 0.8(0.5-13) 

standing) 2 

• Pregnant women in emergency 
department (1 study) 

• Normal electrolyte and creatinine levels 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“In other populations, data suggest lowering the threshold to > 20/min when the 
patient moves from sitting to standing. 


REFERENCES FOR THE UPDATE 

1. Stephan F, Flahault A, Dieudonne N, Hollande J, Paillard F, Bonnet F. Clini¬ 
cal evaluation of circulating blood volume in critically ill patients—contri¬ 
bution of a clinical scoring system. Br JAnaesth. 2001;86(6):754-762. a 

2. Witting MD, Gallagher K. Unique cutpoints for sitting-to-standing 
orthostatic vital signs. Am J Emerg Med. 2003;21 (1 ):45-47. a 


3. Bartok C, Schoeller DA, Sullivan JC, Clark RR, Landry GL. Hydration 
testing in collegiate wrestlers undergoing hypertonic dehydration. Med 
Sci Sports Exerc. 2004;36(3):510-517. a 


a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 
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Hypovolemia, Adult 



TITLE Hydration Testing in Collegiate Wrestlers Under¬ 
going Hypertonic Dehydration. 

AUTHORS Bartok C, Schoeeler DA, Sullivan JC, Clark 
RR, Landry GL. 

CITATION Med Sci Sport Exerc. 2004;36(3):510-517. 

QUESTION In a controlled situation of iatrogenically 
induced dehydration, what are the thresholds for com¬ 
monly measured laboratory tests? 

DESIGN Prospective. 

SETTING Metabolic laboratory. 

PATIENTS Twenty -five healthy collegiate wrestlers. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The students were evaluated in a euvolemic state on day 1. 
On day 2, they were randomized to dehydration levels of 2%, 
3%, 4%, or 5%. Careful measurements of diet, fluid intake, 
weight, and both total body and extracellular water (radiodi¬ 
lution techniques) confirmed that they reached the level of 
prespecified dehydration. The urine specific gravity and pro¬ 
tein levels were determined by 2 independent observers. The 
data were compared with laboratory measures. 

MAIN OUTCOME MEASURES 

The screening test was urine specific gravity and urine pro¬ 
tein levels measured by a bedside test (Multistix; Miles Diag¬ 
nostics, Elkhart, Indiana). 

MAIN RESULTS 

Only 1 subject had dipstick proteinuria when euvolemic; all 
had proteinuria during dehydration. 

A receiver operating characteristic curve selected a specific 
gravity of 1.020 as the appropriate cut point for the dipstick 

(Table 24-10). 


Table 24-10 Likelihood Ratio of Urine Specific Gravity for Dehydration 

Test Cut Point LR+(95% Cl) LR- (95% Cl) 

Urine specific gravity, >1.020 11 (3-43) 0.09(0.03-0.36) 

dipstick (g/mL) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Carefully controlled study. 

LIMITATIONS Small sample size and unique population of 
patients limit generalizability. No physical examination find¬ 
ings were included. 

We include this study in our review despite the small sam¬ 
ple size and lack of clinical examination findings because they 
evaluated a bedside paraclinical test (urine dipstick) in a 
highly controlled situation. Although the authors observed a 
lack of correlation between the absolute specific gravity level 
measured in the laboratory and the percentage of dehydra¬ 
tion, that finding belied the excellent discriminative proper¬ 
ties of the specific gravity. The National Collegiate Athletic 
Association does use a specific gravity of 1.020 as the thresh¬ 
old for further testing to make sure that collegiate wrestlers 
have not dehydrated themselves to gain eligibility in a lower 
weight class. 1 

The question for clinicians is whether these data apply to 
patients treated in an uncontrolled situation in an emergency 
department or outpatient clinic. The subjects in this study on 
day 1 were used for determining the specificity. On day 2, 
they underwent controlled dehydration and were used for 
establishing the sensitivity. Thus, there were 2 populations of 
patients: one in which hypovolemia was expected and one in 
which it was not. This sort of enrollment is different from 
what would happen in clinical practice, in which all the 
patients are enrolled because of a suspicion of hypovolemia. 
On the other hand, a prospective study with an enriched 
population of patients most likely to have hypovolemia 
would almost certainly yield results with some verification 
bias (underestimated specificity). Thus, the specificity found 
in this study is plausible and should be validated. 
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To apply these results, at the very least the patients must be 
young and healthy, without chronic illness, and there must 
be a reasonable basis (ie, acute illness) for suspecting intra¬ 
vascular volume depletion not caused by blood loss. As with 
collegiate wrestlers, additional laboratory evaluation makes 
sense when the patient’s specific gravity exceeds 1.020 and 
the subject meets these criteria. Larger studies in a clinical 
population would be necessary to determine whether lower 
specific gravity values really do rule out dehydration. What¬ 
ever the case, the clinical evaluation must first identify the 
patients for whom the measure would apply. 

REFERENCE FOR THE EVIDENCE 

1. Bubb RG. 2004 Wrestling Rules and Interpretation : Appendix H. National 
Indianapolis, IN: Collegiate Athletic Association; 2003: WA-27. http:// 
www.ncaa.org/library/rules/2004/2004_wrestling_rules.pdf. Accessed May 
31,2008. 

Reviewed by David L. Simel, MD, MHS 


TITLE Clinical Evaluation of Circulating Blood Volume 
in Critically Ill Patients—Contribution of a Clinical Scor¬ 
ing System. 

AUTHORS Stephan F, Flahault A, Dieudonne N, Hol- 
lande J, Paillard F, Bonnet F. 

CITATION Br JAnaesth. 2001;86(6):754-762. 

QUESTION Can a variety of clinical findings predict 
hypovolemia? 

DESIGN Prospective, independent convenience sample. 
SETTING Intensive care unit. 

PATIENTS Sixty-eight prospectively enrolled patients 
during a 2-year period, for whom physicians were uncer¬ 
tain about the presence of hypovolemia. A predictive model 
was created for these patients, and then another 30 patients 
were prospectively enrolled. Of these, 39 (57%) were post¬ 
operative patients, 45 had sepsis, 6 had gastrointestinal 
hemorrhage, and 8 had a variety of conditions. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Clinical assessment was done by 2 attending clinicians, inde¬ 
pendently of each other. Disagreements were resolved by a 
third clinician. The clinical findings were all readily available 
at the bedside and included (1) fluid losses (defined by body 
drainage tubes or aspiration of gastric contents), (2) fluid 
balance (from the intake and output records), (3) skin mot¬ 
tling, (4) pulmonary congestion (defined by the presence of 


either rales or crackles on physical examination or from a 
chest radiograph that showed alveolar edema and vascular 
redistribution), (5) clinically diagnosed congestive heart fail¬ 
ure, (6) peripheral edema, or (7) evidence of third-spacing of 
fluid (defined by ascites or pleural effusion). Central venous 
pressure was measured with a pressure transducer, zeroed to 
the midchest level. The reference standard for volume lost 
was assessment of circulating blood volume using radiola¬ 
beled albumin. Hypovolemia was defined as a circulating 
blood volume at least 10% lower than the predicted mean for 
healthy subjects of the same sex, height, weight, and age. The 
authors reported that their circulating blood volume was 
precise to ±5%. 

In addition to the clinical findings, vital signs (blood pres¬ 
sure and pulse) and laboratory measures were obtained. 

MAIN OUTCOME MEASURES 

Interobserver variability for clinical findings, and the sensi¬ 
tivity, specificity, and likelihood ratios (LRs) (compared with 
circulating blood volume). A clinical prediction model was 
developed from the LRs and tested prospectively. 

MAIN RESULTS 

Thirty-six (53%) of the prospectively enrolled patients were 
hypovolemic. For patients who were hypovolemic, the mean 
blood volume deficit was 514 mL (SD = 194). 

The clinical examination components that the clinicians 
elicited showed excellent observer agreement: The heart rate, 
systolic and diastolic pressures, and urinary sodium levels 
were not statistically significant between the hypovolemic 
and nonhypovolemic groups. 

The clinical findings with the highest diagnostic odds 
ratios (Table 24-11) were also the findings that carried the 
most weight in a predictive score when the variables are con¬ 
sidered together. 


Table 24-11 Observer Variability (k), Likelihood Ratios, and Diagnostic 
Odds Ratios for Clinical Findings of Hypovolemia 

Test 


LR+ (95% Cl) 

LR- (95% Cl) 

DOR 

Peripheral edema® 

0.82 

1.5(0.94-2.4) 

0.64(0.38-1.1) 

2.3 

Fluid balance 


1.5(0.76-2.9) 

0.79 (0.54-1.1) 

1.9 

Pulmonary con¬ 
gestion® 

0.78 

1.3 (1.0-1.7) 

0.27 (0.08-0.90) 

5.0 

Skin mottling 

1.0 

1.3(0.56-3.0) 

0.92 (0.70-1.2) 

1.4 

Clinical diagnosis 
of heart failure® 

0.84 

1.1 (0.93-1.3) 

0.36(0.07-1.7) 

3.1 

Third spacing 

0.86 

1.1 (0.91-1.3) 

0.44(0.12-1.6) 

2.5 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 

a For these items, the absence of the finding would be considered a “positive” result 
for hypovolemia. As an example, the lack of a clinical diagnosis of heart failure con¬ 
fers an LR+ for hypovolemia of 1.1. The presence of heart failure makes hypovole¬ 
mia less likely and therefore has an LR- of 0.36. 
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The hypovolemia score = 

-5 (from the pretest probability) + 

Fluid loss (14 if present, -4 if absent) + 

Fluid balance (41 if there are more fluids “out” than “in,” 
-24 if balance is equal or positive) + 

Skin mottling (29 if present, -10 if absent) + 

Pulmonary congestion (20 if absent, 

-90 if congestion is present) + 

Heart failure (11 if no heart failure, 

-105 if heart failure is present) + 

Peripheral edema (25 if no edema, -90 if edema is present) + 

Third spacing (27 if no ascites or pleural effusion, 

-184 if ascites or pleural effusion is present) 

Central venous pressure (117 if < 2 mm Hg, 

- 42 if > 2 mm Hg) 

The results of the score are placed in the equation below to 
estimate the probability of hypovolemia. Note that if the cen¬ 
tral venous pressure is not measured, a value of 0 is assigned 
for the component. The probability can be calculated directly 
from the equation: 

Probability (%) = {1 / [(exp ( “ score,100) + 1]} x 100 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Objective reference standard for circulating 
blood volume in a population of patients for whom the clini¬ 
cians were uncertain about hypovolemia. The clinical find¬ 
ings were assessed independently, and the observer 
agreement was determined. Definitions for the clinical find¬ 
ings are provided. 

LIMITATIONS The authors observe that the reference stan¬ 
dard might slightly overestimate circulating blood volume. 
The comparison of circulating blood volume results with a 
normal population may not be correct for intensive care unit 
(ICU) patients (although it does seem reliable). The overall 
clinical assessment was used to identify patients eligible for 
this study, resulting in a high prevalence of hypovolemia 


compared with that in all other ICUs. The results should not 
be generalized to settings other than the one in which it was 
studied. The figures presented in the article do suggest a good 
correlation at prevalence values of increased circulating 
blood volume exceeding 50%. The details for data reduction 
to create a parsimonious model are not given, so it is difficult 
to know whether all the variables in the model are necessary. 

This is a clever study and the investigators use a criterion 
standard that was applied close to the time of the clinical 
assessment, independent of the clinical findings, and that was 
reproducible. They observe, correctly, that the clinical find¬ 
ings all assess extravascular volume excess, which allows cli¬ 
nicians to make inferences about intravascular volume. 

The issue of verification bias is difficult to sort out. The 
patients were selected specifically because the clinicians 
could not “rule in” or “rule out” hypovolemia—a common 
problem in ICUs. A bias toward consistently better specific¬ 
ity does not seem to exist. Furthermore, given the number 
of findings assessed, it seems unlikely that the presence or 
absence of any one finding consistently identified patients 
for the study (ie, selection bias). If that is the case, then the 
findings should have been distributed randomly among 
those with and without hypovolemia. Given the poor per¬ 
formance characteristics of the individual findings, it seems 
unlikely that verification bias had a major effect on the final 
results. 

Although the individual findings function poorly, the com¬ 
bination of findings may work well in this setting and with 
patients for whom the presence of hypovolemia is uncertain. 
The results suggest the potential importance of evaluating 
combinations of findings even when the individual clinical 
examination results lack discriminating power. The model 
needs validation in the emergency department, but the com¬ 
ponents suggest it would be less useful in patients who are not 
acutely ill. First, these patients had been ICU patients for at 
least a day so that the variables could be assessed. Second, 
some of the variables would not apply to the acute emergency 
department or clinic patient (eg, fluid loss through drainage 
tubes). Third, the model needs assessment at different preva¬ 
lences of hypovolemia because the starting score of -5 and the 
scores for the component measures could change as the prior 
probability deviates from 50%. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Unique Cut Points for Sitting-to-Standing Ortho¬ 
static Vital Signs. 

AUTHORS Witting MD, Gallagher K. 

CITATION Am JEmergMed. 2003;21(l):45-47. 

QUESTION Are the thresholds for detecting abnormal 
vital sign changes when going from supine to standing the 
same as when going from sitting to standing? 

DESIGN Prospective convenience sample compared to 
prospectively collected data on blood donors. 

SETTING Emergency department. 

PATIENTS A total of 176 patients in the emergency 
department, with no cardiovascular symptoms, hyperten¬ 
sion, anemia, diabetes, substance abuse, orthostatic hypoten¬ 
sion history, or cancer. All patients were presumed to be 
euvolemic, and none had chest discomfort, dyspnea, palpita¬ 
tions, lightheadedness, vomiting, diarrhea, decreased appe¬ 
tite, abdominal pain, pharyngitis, melena, laceration, or 
diffuse trauma. The data for supine to standing were from 
292 healthy blood donors, obtained before they donated 
blood. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Blood pressure (BP) and pulse were measured after the sub¬ 
ject had been sitting 5 minutes. The patient then stood up 
and, after 1 minute, the vital signs were retaken. An auto¬ 
mated BP device was used, prohibiting observer variability. 

MAIN OUTCOME MEASURES 

The specificity of vital sign changes for sitting to standing 
compared with the vital sign changes from supine to standing. 


MAIN RESULTS 

The mean change in pulse from sitting to standing was 5.3/ 
min (95% cofidence interval [Cl], 4.3-6.3/min), whereas the 
mean change in systolic pressure from sitting to standing was 
—1.2 mm Hg (95% Cl, -0.3 to 2.6 mm Hg). The specificity 
for both findings was high ( fable 24 12). 


Table 24-12 Specificity of Findings for Change in Pulse Rate or 
Pressure as a Function of Posture Change 

Test 

Cut Point 

Specificity (95% Cl) 

Pulse change 

Sitting to standing +20/min 

0.98 (0.94-0.99) 


Supine to standing +30 

0.98 (0.96-0.99) 

Systolic blood 

Sitting to standing -20 

0.97 (0.92-0.99) 

pressure, mm 
Hg 

Supine to standing -25 

0.98 (0.95-0.99) 


Abbreviation: Cl, confidence interval. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Simple study that used an automated device 
to prevent observer variability and bias. 

LIMITATIONS Historical control. This study evaluates the 
magnitude of differences in pulse and systolic BP when the clini¬ 
cian chooses to have the patient go from sitting to standing 
rather than supine to standing. In normovolemic subjects, a 
pulse change of 30/min occurs in only 2% to 4% (ie, a specificity 
of 96%-98%). Patients who go from sitting to standing will not 
have as great a pulse change. These data show that a threshold of 
20/min should be used for these patients. The Cl around the 
beats per minute is slightly narrower than the Cl for systolic BP. 
Thus, the change in pulse should be preferred as the screening 
test. The authors also evaluated a combination of the 2 findings, 
pulse/systolic BP (called the shock index); the specificity for 
changes at adjusted cut points had a wider Cl than the pulse. 

Reviewed by David L. Simel, MD, MHS 
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CHAPTER 


CLINICAL SCENARIOS 


Is This Child 

Dehydrated? 

Michael J. Steiner, MD 
Darren A. DeWalt, MD, MPH 
Julie S. Byerley, MD, MPH 


CASE 1 A 20-month-old girl is brought to the emergency 
department (ED) after 2 days of vomiting and diarrhea. Her 
father reports that she has not eaten normally since the illness 
began and now will not drink. She has had 8 stools so far 
today, but he does not think there were any diapers with 
urine in them. The child appears mildly ill but does make 
tears while crying. Her respiratory rate and quality are nor¬ 
mal, along with her other vital signs. Her mouth is somewhat 
dry, capillary refill time is 1.5 seconds, and skin turgor is nor¬ 
mal. Her serum (blood) urea nitrogen concentration (BUN) 
is 12 mg/dL, and bicarbonate concentration is 19 mEq/L. 

CASE 2 A 5-month-old boy presents to a health care clinic 
in a developing country. The child lives in a rural area, and 
there is no running water in the family home. The child 
began having nonbloody, profuse, watery stools approxi¬ 
mately 7 days ago. The family has World Health Organiza¬ 
tion (WHO) oral rehydration packets at home that the 
child has eagerly consumed. He seemed less interested in 
drinking this morning so his parents began the trip to the 
clinic. The child is now quiet and hyperpneic. He has 
sunken eyes and a dry mouth. His capillary refill time is 3 
seconds, and his skin turgor is prolonged. 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


Dehydration is one of the leading causes of morbidity and 
mortality in children throughout the world. 12 Diarrheal dis¬ 
ease and dehydration account for as much as 30% of world¬ 
wide deaths among infants and toddlers; 8000 children 
younger than 5 years die each day because of gastroenteritis 
and dehydration. 2 ' 4 In the United States, children younger than 
5 years have an average of 2 episodes of gastroenteritis per year, 
leading to 2 to 3 million office visits and 10% of all pediatric 
hospital admissions. 1,5,6 The direct costs of outpatient and hos¬ 
pital visits are more than $2 billion per year, not including 
indirect costs to families and society. 4 Despite aggressive medi¬ 
cal care, as many as 300 US children still die each year as a 
result of gastroenteritis and associated dehydration. 1,6 

Many other childhood illnesses in addition to gastroenteritis 
are associated with dehydration. Gingivostomatitis, bronchioli¬ 
tis, pyloric stenosis, and focal bacterial infections such as pneu¬ 
monia, meningitis, and urinary tract infections can all lead to 
dehydration. For this reason, the morbidity and mortality 
related to dehydration are actually much higher than that associ¬ 
ated solely with gastroenteritis. Dehydration is such a common 
concern in pediatrics that clinicians in primary care offices, EDs, 
and hospital settings all assess volume status as part of their eval¬ 
uation. This assessment helps guide decisions about therapy and 
patient disposition. 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 
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The American Academy of Pediatrics (AAP), Centers for 
Disease Control and Prevention (CDC), and WHO have all 
developed treatment guidelines for gastroenteritis according to 
the clinical assessment of dehydration. The AAP guideline 
states that “the treatment of a child with diarrhea is directed 
primarily by the degree of dehydration present.” 4 They recom¬ 
mend clinically deciding whether a patient is mildly (3%-5%), 
moderately (6%-9%), or severely (>10%) dehydrated and then 
treating according to that classification. The CDC uses a simi¬ 
lar assessment and scale in its recommendations on the initial 
management of diarrhea. 1 ' 3 WHO has also incorporated signs 
of dehydration into the Integrated Management of Childhood 
Illness Scale, which assists practitioners in developing coun¬ 
tries in making treatment and referral decisions. 7 

Inaccurate assessment of dehydration can have important 
consequences. Unrecognized and untreated fluid deficits can 
create electrolyte disturbances, acidosis, and end-organ dam¬ 
age, including cardiovascular instability, renal insufficiency, 
and lethargy. These complications can produce devastating 
results, including permanent injury or death. Conversely, 
unnecessary interventions can occur after erroneous assess¬ 
ment that a child has moderate or severe dehydration when 
he or she is actually euvolemic or only mildly dehydrated. 5 
Despite recommendations for oral rehydration in mild or 
moderate dehydration, this therapy is used in less than 30% 
of the cases of diarrhea in the United States for which it is 
indicated. 8 Clinicians may rely on the more invasive intrave¬ 
nous rehydration in part because they overestimate the 
degree of dehydration. Both overestimating and underesti¬ 
mating the degree of dehydration can increase health care 
costs and cause unnecessary morbidity. 

Pediatricians generally use the terms dehydration, volume 
depletion, and hypovolemia interchangeably to represent fluid 
loss in outpatient settings. Literature that focuses on physio¬ 
logic changes caused by different types of fluid loss differenti¬ 
ates among these terms. 9 Because this discrimination can have 
unclear clinical implications and to simplify discussion, much 
of the clinical literature combines terminology. 10 Herein, we 
follow this convention and use the term dehydration to repre¬ 
sent all fluid deficits except in circumstances such as whole 
blood loss or significant sodium alteration, in which impor¬ 
tant clinical implications are evident. 

The quantification of dehydration is an important and 
commonly used skill for assessment of pediatric patients. 
Despite this importance, the utility of the clinical history, 
physical examination, and laboratory tests to assess dehydra¬ 
tion in children has not been systematically reviewed. Most 
teaching regarding the assessment of dehydration is based on 
clinical experience and medical tradition. We conducted a 
systematic review of the literature on the precision and accu¬ 
racy of medical history, physical examination, and laboratory 
tests in identifying dehydration in children between 1 month 
and 5 years old. 

Anatomic/Physiologic Origins of Dehydration Signs 

Many signs in pediatric assessment are attributed to the fluid 
and electrolyte shifts caused by dehydration. Early work to 


understand dehydration in children focused on intracellular 
and extracellular physiologic changes associated with fluid 

loss. Researchers have fastidiously documented fluid and 
electrolyte losses in dehydration and have even performed 
biopsies of the muscle of children with severe diarrhea to 
understand intracellular fluid and electrolyte shifts. 11 Partic¬ 
ularly instructive experiments used radiolabeled albumin to 
demonstrate that the percentage of body weight lost was 
directly proportional to the percentage of plasma volume 

lost. 12 For example, children who had lost 5% of their body 
weight lost approximately 5% of their plasma volume. 
Because plasma volume is only a small percentage of total 
body water, this experiment indirectly demonstrated that the 
majority of fluid lost in childhood dehydration actually 
comes from either interstitial or intracellular sources. 

The correlation of losses from specific fluid compartments 
to corresponding physical signs has not been clearly docu¬ 
mented. The signs of dehydration appear to represent an 
actual desiccation of tissue (eg, dry mucous membranes), a 
compensatory reaction of the body to maintain vital perfu¬ 
sion (eg, tachycardia), or some combination of both (eg, cap¬ 
illary refill time). Although some authors offer more specific 
explanations of theoretic fluid compartments and their 
examination correlates, these 3 principles should be suffi¬ 
cient for clinical assessment of patients. 

How to Elicit Symptoms and Signs 

Pediatrics practitioners often elicit historical points from adult 
caregivers instead of directly from the patient. When assessing 
volume status in infants, physicians may ask about number of 
wet diapers (surrogate for urine output), presence or absence 
of vomiting and diarrhea, and amount and type of oral intake. 
Caregivers also frequently report their interpretation of exami¬ 
nation signs by clarifying whether the child is active, whether 
the eyes appear sunken, and whether the child drinks vigor¬ 
ously. Clinicians should ask parents whether they have given a 
successful trial of clear fluids at home, whether the child has 
been treated by another medical practitioner during the ill¬ 
ness, and the date and value of the child’s most recent weight 
measurement. 1,13 

The ability to elicit some examination signs is impaired when 
pediatric patients are crying and uncooperative. Therefore, 
assessment of hydration status should progress from the least to 
the most invasive maneuvers. The examination should begin 
with the child across the room in a position of comfort (eg, in 
the parent’s lap). Overall appearance, activity, and response of 
the child to stimulation should be observed. Evaluating the res¬ 
piratory pattern is important for assessment of dehydration and 
all other acute illnesses. Respiratory rate should be measured for 
60 seconds by observing chest wall movements. The precise 
measurement requires a quiet and comfortable child. The rate 
should be compared with age-based norms. 14 In a potentially 
dehydrated child, the examiner should specifically look for 
hyperpnea (deep, rapid breathing without other signs of respira¬ 
tory distress), suggestive of an acidosis. 1 Other vital signs, 
including temperature, pulse, and blood pressure, should also 
be evaluated while the child is comfortable. 1 
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Next, the clinician should assess skin turgor and capillary 
refill time. Skin turgor has been used to diagnose dehydra¬ 
tion for more than 50 years and, when abnormal, is also 
called “tenting” or “inelastic skin.” 15 ' 16 To elicit the sign, the 
examiner should use the thumb and index finger to pinch a 
small skin fold on the lateral abdominal wall at the level of 
the umbilicus. 15 The fold should be promptly released, and 
then the time is measured for the skin fold’s return to normal 
form. 15 Clear norms for this time have not been published, 
and most clinicians simply qualify skin turgor as immediate, 
slightly delayed, or prolonged. 

Excess subcutaneous fat and hypernatremia may falsely 
normalize the turgor in dehydrated children, whereas malnu¬ 
trition may falsely prolong the recoil time. 15 ' 17-21 Primary skin 
disorders complicate the interpretation of skin turgor. 19 

To assess capillary refill time, the examiner compresses a 
superficial capillary bed and estimates the time it takes for 
normal color to return after the pressure is released. Capil¬ 
lary refill time varies as a function of ambient temperature, 
site of application, lighting, medications, and primary (eg, 
reflex sympathetic dystrophy) or secondary (eg, cardiogenic 
shock) autonomic changes. 16,18,22 ' 24 Extremes in patient tem¬ 
perature may also affect the capillary refill time; for example, 
capillary refill times are markedly prolonged after cold 
immersion. 25 However, Gorelick et al 22 found that fever did 
not affect the test characteristics in children with vomiting, 
diarrhea, or poor oral intake. According to the available stud¬ 
ies, and to standardize examination techniques, we recom¬ 
mend assessing capillary refill time on a finger with the arm 
at the level of the heart in a warm ambient temperature. Pres¬ 
sure should be gradually increased on the palmar surface of 
the distal fingertip and then released immediately after the 
capillary bed blanches. The time elapsed until restoration of 
normal color should be estimated. Although many practitio¬ 
ners use other sites to measure capillary refill time, most 


studies of this sign use the palmar surface of the distal finger¬ 
tip. 22 ' 26 Using this approach, values for nondehydrated chil¬ 
dren are less than 1.5 to 2 seconds. 25 

METHODS 

Search Strategy and Quality Review 

We identified articles by direct searches of the MEDLINE 
database via the PubMed search engine. The first and broad¬ 
est search strategy used “dehydration” and “diagnosis,” 
“hypovolemia” and “diagnosis,” or “intravascular volume 
depletion” and “diagnosis.” All searches were limited by age 
(all children: 0-18 years) and publication date (January 1966 
to April 2003). These searches produced 1537 articles. We 
supplemented this preliminary search with the standardized 
search technique used in The Rational Clinical Examination 
series (available from the authors). This second search pro¬ 
duced 24 additional articles. 

Each of the authors reviewed the titles and available 
abstracts from the 1561 articles, selecting for further review 
those that appeared to address the evaluation of dehydration 
in children aged 1 month to 5 years. We did not exclude arti¬ 
cles if the study enrolled some children outside that age 
range. Through consensus, we identified 68 articles as poten¬ 
tial sources of primary data or reviews with potential back¬ 
ground information and thorough reference lists. 

To ensure a comprehensive literature review, we used addi¬ 
tional techniques to identify articles (Figure 25-1). One 
author (M.J.S.) searched for individual symptoms and signs 
associated with the diagnosis of dehydration in children. 
These terms included “capillary refill,” “skin turgor,” “dry 
cry,” “tears,” “mucous membrane,” “sunken eyes,” “fontanelle” 
and “dehydration,” “urine specific gravity,” “urine” and 
“dehydration,” “hemoconcentration,” “BUN,” “urine,” “blood 


Figure 25-1 Selection Process for Studies 
Included in Review 


1561 Articles identified in initial 
MEDLINE searches 





—- 

1493 Excluded (no original data 

on dehydration signs in children) 



68 Articles for further review 



42 Articles identified in alternative search 
strategies 

3 Textbook references 
7 Files of experts 
18 Search on specific dehydration 
symptoms and signs 
0 Cochrane library 
14 Reference lists of 
included articles 


110 Full-text articles reviewed 


26 Met initial inclusion criteria 


13 Studies excluded 

1 Retrospective chart review with 
disease-specific laboratory tests 
1 Patients part of another included study 
1 Method of dehydration examination not 
described 

10 Level 5 evidence quality 


13 Studies included 
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pressure,” “bioimpedance,” “orthostasis,” “respiration,” “par¬ 
ent” and “dehydration,” “pulse,” and “heart rate” (all limit: 
aged 0-18 years, human, NOT “dehydration” and “diagno¬ 
sis”). The Cochrane Library, reference lists of pediatric and 
physical examination textbooks, 27 ' 32 reference lists of all 
included articles, and articles from the collections of experts 
in the field were reviewed. Forty-two potential articles were 
identified from the supplemental searches. 

We performed a full review of the 110 retained articles to 
identify those with primary data comparing dehydration 
with a symptom, sign, or laboratory value in pediatric 
patients. Twenty-six articles met these criteria and under¬ 
went a full quality assessment with an established method- 
ologic filter that has been consistently used and described in 
The Rational Clinical Examination series (see Table 1-7). 33 A 
second author then checked the initial quality review. The 
group always arrived at a consensus on the final evidence 
quality level assigned. 

Nine of the 110 articles that underwent a full-text review 
were written in languages other than English. Medical school 
faculty, residents, or students at our institution who were pri¬ 
mary speakers of the written language read each of these arti¬ 
cles. Six of these 9 articles did not meet inclusion criteria and 
were excluded, whereas 3 were assigned an evidence quality 
level according to a translation of the article. 

No studies on physical examination signs, symptoms, or 
laboratory results in childhood dehydration demonstrated 


evidence quality criteria for level 1 or 2. Four studies were 
assigned to level 3, but one of these was eventually excluded 
because the study population overlapped with that in 
another included study. 22 Twelve studies were initially 
assigned to level 4, although one was excluded because of 
methodologic flaws 12 and another was excluded because of its 
retrospective design and restriction to children with pyloric 
stenosis. 34 

We chose the difference between the rehydration weight and 
the acute weight divided by the rehydration weight as the best 
available gold standard of percentage of volume lost. 35 Ten arti¬ 
cles used gold standards based solely on examination signs or a 
general dehydration assessment. These were assigned an evi¬ 
dence quality level of 5 and were subsequently excluded. Fig¬ 
ure 25-1 shows a schematic representation of the methods, and 
Table 25-1 summarizes the 13 included studies. 

Statistical Analyses 

We report precision data as a range of K values obtained 
directly from the published results. Two-by-two tables were 
created from the published information regarding accuracy 
and were used to calculate point estimates and 95% confi¬ 
dence intervals (CIs) for the sensitivity, specificity, and likeli¬ 
hood ratios (LRs) for each test. 44 One author provided 
original data to calculate these values because they were not 
calculable from the original publication. 18 We created these 
2x2 tables for detecting both 5% and 10% dehydration 


Table 25-1 Summary of Included Studies 






Source, y 

Evidence Quality 
Level 

Country 

Setting 

No. of 

Participants 

Age Range 

Inclusion Criteria 

Porter et al, 13 2003 

3 

United States 

Emergency 

department 

71 

1 mo-5y 

Chief complaint of vomiting, diar¬ 
rhea, or poor oral intake 

Laron, 15 1957 

4 

United States 

Hospital 

21 

1 mo-3.5 y 

Admitted with diarrhea 

Saavedra et al, 16 1991 

4 

United States 

Hospital 

32 

2-24 mo 

Admitted with diarrhea 

Duggan et al, 18 1996 

4 

Egypt 

Gastroenteritis clinic 

135 

3-18 mo 

Acute diarrhea and dehydrated 

Gorelick et al, 35 1997 

3 

United States 

Emergency 

department 

225 

1 mo-5y 

Chief complaint of vomiting, diar¬ 
rhea, or poor oral intake 

Duggan et al, 36 1997 
(precision only) 

3 

Egypt 

Gastroenteritis 

clinic 

100 

2 mo-4 y 

>5 Stools in last 24 h 

MacKenzie et al, 37 1989 

4 

Australia 

Hospital 

102 

<4 y 

Admitted with gastroenteritis and 
dehydration 

English et al, 38 1997 

3 

Kenya 

Hospital 

119 

>1 mo 

Admitted with malaria and coma, 
respiratory distress, or prostration 

Plata Rueda and Diaz Cruz, 39 
1974 

4 

Columbia 

Hospital 

100 

<73 mo 

Admitted with diarrhea and dehy¬ 
dration 

Vega and Avner, 40 1997 

4 

United States 

Emergency 

department 

97 

2 wk-15 y 

Dehydrated and needed intrave¬ 
nous fluids 

Amin et al, 41 1980 

4 

Indonesia 

Hospital 

36 

<24 mo 

Admitted with diarrhea and dehy¬ 
dration 

Teach et al, 42 1997 

4 

United States 

Emergency 

department 

40 

2 wk-12 y 

Dehydrated and needed intrave¬ 
nous fluids 

Yilmaz et al, 43 2002 

4 

Turkey 

Emergency 

department 

168 

1-21 mo 

Received intravenous fluids and 
hospitalized for gastroenteritis 
and dehydration 
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when data were available. A range of values was provided 
when only 2 studies evaluated an individual diagnostic test. If 
more than 2 studies evaluated a test, then we combined the 
results with a random-effects model. Data for meta-analysis 
were not weighted according to the quality of included stud¬ 
ies. Statistical tests were performed with STATA software, 
version 7.0 (StataCorp, College Station, Texas). 

We performed tests of heterogeneity for data used in all 
meta-analyses and found significant heterogeneity for most 
signs. Analysis of data with a random-effects model is com¬ 
plicated by the presence of heterogeneity. However, combin¬ 
ing data in this manner allows clinicians to make general 
summary “best estimates” of utility according to all of the 
included studies. Furthermore, the degree of uncertainty 
between LRs of summary estimates was more obvious with 
the broad range of 95% CIs as opposed to the narrower range 
for the individual point estimates. Thus, the summary LRs 
lower the risk of clinicians being overly confident about the 
utility of clinical findings. 

RESULTS 

Precision of Symptoms and Signs 

Porter et al 13 evaluated the agreement between parental obser¬ 
vation of examination signs and the signs elicited by trained 
ED nurses. The K value demonstrated substantial agreement 
beyond chance when assessing for a sunken anterior fontanelle 
(k = 0.73) and presence of cool extremities (k = 0.70). There 
was moderate agreement on general appearance (k = 0.46), 
presence of sunken eyes (k = 0.49), absence of tears (k = 0.57), 
and presence of dry mouth (k = 0.52). 

Three included studies reported interrater agreement 
among clinicians, ranging from chance to good agreement 
(Table 25-2). 16,35,36 Agreement on respiratory rate and pattern 
may be no better than that which occurs by chance. The 
other signs had higher levels of agreement, although the 
range of K levels for these findings was broad. 

Accuracy of Symptoms, Signs, and Laboratory Studies 

Symptoms 

Three studies evaluated the accuracy of history-taking in assess¬ 
ing dehydration. 13,35,37 All 3 of these studies evaluated history of 
low urine output as a test for dehydration. In the pooled analy¬ 
sis, low urine output did not increase the likelihood of 5% dehy¬ 
dration (LR, 1.3; 95% Cl, 0.9-1.9). Porter et al 13 showed that a 
history of vomiting, diarrhea, decreased oral intake, reported 
low urine output, a previous trial of clear liquids, and having 
seen another clinician during the illness before presenting to the 
ED yielded LRs that lacked utility in the assessment of dehydra¬ 
tion. However, their data did suggest that children who had not 
been previously evaluated by a physician during the illness 
might be less likely to be dehydrated on presentation (LR, 0.09; 
95% Cl, 0.01-1.4). Similarly, parental report of a normal urine 
output decreases the likelihood of dehydration (Gorelick et al 35 
reported an LR of 0.27 [95% Cl, 0.14-0.51] and Porter et al 13 
reported an LR of 0.16 [95% Cl, 0.01-2.5]). 


Table 25-2 Precision of Examination Signs for Dehydration 

Finding 

Reference 

Total No. of 
Participants 

Range of k 
Values 

Prolonged capillary refill 

16,35,36 

216 

0.01 to 0.65 

Abnormal skin turgor 

35,36 

184 

0.36 to 0.55 

Abnormal respiratory pattern 

35,36 

184 

-0.04 to 0.40 

Extremity perfusion 

35 

100 

0.23 to 0.66 

Absent tears 

35, 36 

184 

0.12 to 0.75 

Sunken fontanelle 

36 

100 

0.10 to 0.27 

Sunken eyes 

35, 36 

184 

0.06 to 0.59 

Dry mucous membranes 

35,36 

184 

0.28 to 0.59 

Weak pulse 

35, 36 

184 

0.15 to 0.50 

Poor overall appearance 

35,36 

184 

0.18 to 0.61 


Examination Signs 

Table 25-3 is a comprehensive list of individual physical 
examination signs and their test characteristics in evaluating 
children for 5% dehydration. Signs were included when they 
were evaluated in 2 or more studies, and calculations based 
on pooled results were performed when evaluated in 3 or 
more studies. 

Three signs were evaluated in multiple studies, had a clini¬ 
cally helpful pooled LR in detecting 5% dehydration, and had 
95% CIs wholly above 1.0. Capillary refill time was evaluated 
in 4 studies, and the pooled sensitivity of prolonged capillary 
refill time was 0.60 (95% Cl, 0.29-0.91), with a specificity of 
0.85 (95% Cl, 0.72-0.98), for detecting 5% dehydration. 16,35,37,38 
The LR for abnormal capillary refill time was 4.1 (95% Cl, 1.7- 
9.8). This was the highest value among examination signs with 
pooled results. Abnormal skin turgor had a pooled LR of 2.5 
(95% Cl, i. 5 - 4 . 2 ) 15,18,35,37,38 and abnormal respiratory pattern 
had a pooled LR of 2.0 (95% Cl, 1.5 -2.7). 18,35,37,38 

Presence of cool extremities or a weak pulse or absence of 
tears also may be helpful tests for dehydration. Absence of 
tears had a pooled LR of 2.3 (95% Cl, 0.9-5.8), but the 
potential utility is limited by a wide 95% Cl that crosses 
1.0. 13,35,37 Two studies examined a weak pulse quality as a test 
for dehydration. One study found a reasonably precise LR for 
weak pulse of 3.1 (95% Cl, 1.8-5.4), 35 but in the other study, 
the 95% Cl was too wide to make a reasonable estimate (LR, 
7.2; 95% Cl, 0.4-150). 18 The 2 studies that evaluated cool 
extremities as a test of dehydration found imprecise point 
estimates for the positive likelihood ratio (LR+) in detecting 
5% dehydration (LR, 19; 95% Cl, 18 1.1-330 and LR, 1.5; 95% 
Cl, 13 0.2-12). 

Sunken eyes and dry mucous membranes offer little help 
clinically; both had narrow 95% CIs but pooled LRs of 1.7. 
An increased heart rate, a sunken fontanelle in young infants, 
and an overall poor appearance are frequently taught as good 
tests for dehydration. However, the objective evidence reveals 
that all have summary LRs of less than 2.0 and 95% CIs that 
cross 1.0. 

Some tests may be clinically useful in decreasing the likeli¬ 
hood of dehydration. Absence of dry mucous membranes 
(LR, 0.41; 95% Cl, 0.21-0.79), a normal overall appearance 
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Table 25-3 Summary Test Characteristics for Clinical Findings to Detect 5% Dehydration 


Finding 

Reference 

Total No. of 
Participants 

LR Summary, Value (95% Cl) or Range 

Present Absent 

Sensitivity (95% Cl) Specificity (95% Cl) 

Prolonged capillary refill 

16, 35, 37, 38 

478 

4.1 (1.7-9.8) 

0.57 (0.39-0.82) 

0.60 (0.29-0.91) 

0.85 (0.72-0.98) 

Abnormal skin turgor 

15,18, 35, 37, 38 

602 

2.5 (1.5-4.2) 

0.66 (0.57-0.75) 

0.58 (0.40-0.75) 

0.76 (0.59-0.93) 

Abnormal respiratory pattern 

18, 35, 37, 38 

581 

2.0 (1.5-2.7) 

0.76 (0.62-0.88) 

0.43(0.31-0.55) 

0.79 (0.72-0.86) 

Sunken eyes 

13,18,35,37 

533 

1.7 (1.1-2.5) 

0.49 (0.38-0.63) 

0.75 (0.62-0.88) 

0.52 (0.22-0.81) 

Dry mucous membranes 

13,18, 35, 37 

533 

1.7 (1.1-2.6) 

0.41 (0.21-0.79) 

0.86 (0.80-0.92) 

0.44(0.13-0.74) 

Cool extremity 

13,18 

206 

1.5-19 

0.89-0.97 

0.10-0.11 

0.93-1.0 

Weak pulse 

18,35 

360 

3.1-7.2 

0.66-0.96 

0.04-0.25 

0.86-1.0 

Absent tears 

13,35,37 

398 

2.3 (0.9-5.8) 

0.54(0.26-1.1) 

0.63 (0.42-0.84) 

0.68 (0.43-0.94) 

Increased heart rate 

18,35,37 

462 

1.3 (0.8-2.0) 

0.82 (0.64-1.0) 

0.52 (0.44-0.60) 

0.58 (0.33-0.82) 

Sunken fontanelle 

13,18,37 

308 

0.9 (0.6-1.3) 

1.12(0.82-1.5) 

0.49 (0.37-0.60) 

0.54 (0.22-0.87) 

Poor overall appearance 

13, 35, 37 

398 

1.9 (0.97-3.8) 

0.46 (0.34-0.61) 

0.80 (0.57-1.0) 

0.45 (-0.1 to 1.0) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


(LR, 0.46; 95% Cl, 0.34-0.61), and absence of sunken eyes 
(LR, 0.49; 95% Cl, 0.38-0.63) had pooled LRs of less than 
0.5. Most clinical scenarios require lower LRs than these to 
rule out dehydration effectively. 

Four studies evaluated clinical prediction models or groups 
of signs. 18,35,39,40 Vega and Avner 40 evaluated a table similar to 
that used in many pediatric textbooks and also commonly 
taught to medical students as the best evaluation tool for dehy¬ 
dration. 30 This scale, displayed in Table 25-4, is similar to the 
one used by the AAP and CDC in their recommendations for 
the management of acute gastroenteritis. 1,3,4 The tool uses the 
assessment of 9 physical examination findings to classify chil- 


Table 25-4 Example of a Commonly Taught Dehydration 

Assessment Scale 0 

Dehydration 

Variable/Sign 

Mild 

(4%-5%) 

Moderate 

(6%-9%) 

Severe (>10%) 

General 

Thirsty, rest- 

Thirsty, drowsy, 

Drowsy, limp, cold, 

appearance 

less, alert 

postural 

hypotension 

sweaty, cyanotic 
extremities 

Radial pulse 

Normal rate 
and strength 

Rapid and weak 

Rapid, thready, some¬ 
times impalpable 

Respirations 

Normal 

Deep, may be 
rapid 

Deep and rapid 

Anterior fontanelle 

Normal 

Sunken 

Very sunken 

Systolic blood 
pressure 

Normal 

Normal or low 

Low 

Skin elasticity 

Pinch retracts 
immediately 

Pinch retracts 
slowly 

Pinch retracts very 
slowly 

Eyes 

Normal 

Sunken 

Grossly sunken 

Tears 

Present 

Absent 

Absent 

Mucous 

membranes 

Moist 

Dry 

Very dry 


“Adapted from Vega and Avner, 40 with permission. 


dren as mildly (4%-5%), moderately (6%-9%), or severely 
(>10%) dehydrated. In 97 children presenting to the ED with 
dehydration requiring intravenous fluids, a classification of 
severe on the scale had an LR of 3.4 (95% Cl, 1.5-7.7) for the 
presence of at least 5% dehydration. Classification of severe 
dehydration also yielded an LR of 4.3 (95% Cl, 2.4-7.8) for at 
least 10% dehydration. A moderate classification by examina¬ 
tion was less useful to diagnose 5% dehydration (LR, 2.1; 95% 
Cl, 0.9-4.8). 40 

Duggan et al 18 evaluated 2 dehydration assessment scales that 
classified children as mild, moderate, or severe according to the 
number of dehydration examination signs present. The authors 
reported the final mean percentage of dehydration within each 
group, and these averages increased significandy as the severity 
assessment increased, 18 which suggests that as more signs of 
dehydration appear, children tend to be more dehydrated. Plata 
Rueda and Diaz Cruz 39 also presented groupings of signs and 
symptoms that attempted to stratify children into different 
degrees of dehydration. Minor physical examination changes 
did not significantly change the likelihood of dehydration; how¬ 
ever, the presence of abnormal skin turgor on the abdomen, 
thorax, extremities, and face, combined with sunken eyes, dry 
mucous membranes, and a sunken fontanelle, did increase the 
likelihood of 10% dehydration (LR, 3.7; 95% Cl, 1.6-8.1). 39 

Gorelick et al 35 created a scale giving equal weight to 10 com¬ 
monly elicited signs: decreased skin elasticity, capillary refill 
time greater than 2 seconds, general appearance, absence of 
tears, abnormal respirations, dry mucous membranes, sunken 
eyes, abnormal radial pulse, tachycardia (heart rate > 150/min), 
and decreased urine output. The presence of at least 3 of the 10 
signs had a sensitivity of 0.87 and a specificity of 0.82 in detect¬ 
ing 5% dehydration (LR+, 4.9; 95% Cl, 3.3-7.2, and negative 
LR, 0.15; 95% Cl, 0.08-0.30). Similarly, 7 of 10 signs had an 
LR+ of 8.4 (95% Cl, 5.0-14) in diagnosing 10% dehydration. A 
logistic regression analysis performed by Gorelick et al 35 
showed that capillary refill time, dry mucous membranes, 
absence of tears, and abnormal overall appearance contained 
most of the predictive power. A simplified assessment tool 
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Table 25-5 Summary Test Characteristics for Laboratory Tests Assessing Dehydration 

Total l\lo. of Sensitivity, Value Specificity, Value 

Laboratory Value Reference Participants (95% Cl) or Range (95% Cl) or Range 

LR Summary, Value (95% Cl) or Range 

Present Absent 

Blood urea nitrogen, mg/dL 

>8 

37,38 


0.38-0.71 

0.71-0.82 

2.1-2.4 

0.41-0.76 

>18 

41,43 


0.63-0.90 

0.55-0.57 

1.4-2.1 

0.17-0.68 

>27 

41 

36 

0.44 (0.19-0.68) 

0.85 (0.69-1.0) 

2.9 (0.9-9.5) 

0.66(0.41-1.1) 

>45 

43 

168 

0.43 (0.34-0.52) 

0.99(0.96-1.0) 

46.1 (2.9-733) 

0.58 (0.49-0.68) 

Blood urea nitrogen/creatinine ratio > 40 42 

40 

0.23 (0.01-0.46) 

0.89 (0.77-1.0) 

2.1 (0.5-8.9) 

0.87(0.62-1.2) 

Bicarbonate, mEq/L 

<17 

40 

97 

0.83 (0.72-0.94) 

0.76 (0.64-0.88) 

3.5 (2.1-5.8) 

0.22(0.12-0.43) 

<15 

43 

168 

0.93 (0.88-0.98) 

0.40 (0.26-0.53) 

1.5 (1.2-1.9) 

0.18(0.08-0.37) 

Base deficit > 7 mEq/L 

37, 38 


0.67-0.75 

0.52-0.59 

1.4-1.8 

0.42-0.68 

pH <7.35 

37 

102 

0.43 (0.28-0.58) 

0.80(0.70-0.91) 

2.2(1.2-4.1) 

0.71 (0.53-0.95) 

Anion gap > 20 mEq/L 

42 

40 

0.46 (0.19-0.73) 

0.74(0.58-0.91) 

1.8 (0.8-4.2) 

0.73(0.42-1.3) 

Uric acid > 10 mg/dL 

42 

40 

0.23 (0.01-0.46) 

0.78 (0.62-0.93) 

1.0 (0.3-3.5) 

0.99(0.69-1,4) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


using the presence of 2 of these 4 signs yielded an LR+ of 6.1 
(95% Cl, 3.8-9.8) for diagnosing 5% dehydration. 35 

Laboratory Tests 

Six studies evaluated the utility of laboratory tests in 
assessing dehydration (Table 25-5). 37,38,40 ' 43 Five studies 
evaluated BUN concentration or BUN/serum creatinine 
ratio as a test for dehydration. 37,38,41 ‘ 43 BUN cutoffs of 8, 18, 
and 27 mg/dL produced LRs ranging from 1.4 to 2.9. Yil- 
maz et al 43 found that in a group of hospitalized children 
with gastroenteritis, BUN greater than 45 mg/dL was spe¬ 
cific for at least 5% dehydration (specificity of 1.0). How¬ 
ever, this was a small study and the estimated 95% Cl for 
an LR+ was 3 to 730. 

Four studies evaluated acidosis as a test for dehydra- 
tion. 37,38,40,43 Most patients enrolled in these studies had acute 
diarrhea, a potential cause of acidosis. Mackenzie et al 37 and 
English et al 38 used a base deficit of greater than 7 as the meas¬ 
ure of acidosis. (Base deficit estimates the severity of metabolic 
acidosis by comparing the patient’s bicarbonate concentration 
to historical norms for a given pH and Pco 2 .) In both studies, 
the LR+ was less than 2.0. Although Yilmaz et al 43 found that 
an absolute serum bicarbonate concentration of less than 
15 mEq/L was not helpful (LR for low serum bicarbonate, 1.5; 
95% Cl, 1.2-1.9), Vega and Avner 40 found that an absolute 
bicarbonate concentration of less than 17 mEq/L offered some 
help in diagnosing children with 5% dehydration (LR, 3.5; 
95% Cl, 2.1-5.8). Teach et al 42 evaluated serum uric acid and 
an increased anion gap as tests for dehydration but found that 
abnormal results were not helpful. Urine specific gravity was 
evaluated by English et al 38 but was not found to be signifi¬ 
cantly correlated with dehydration. The only laboratory mea¬ 
surement that appears to be valuable in decreasing the 
likelihood of 5% dehydration is serum bicarbonate. A serum 
bicarbonate concentration of more than 15 or 17 mEq/L has 


an LR range of 0.18 to 0.22, reducing the likelihood of dehy¬ 
dration if the child has gastroenteritis. 40,43 

Limitations 

The published literature on assessment of dehydration has 
significant limitations affecting both internal and external 
validity. As discussed in the “Methods” section, none of the 
identified studies met the criteria for high-quality (level 1 or 
level 2) evidence according to the established methodologic 
filter. The best available studies had modest sample sizes, 
used nonconsecutive patients, and did not compare the 
included children with those excluded from the study popu¬ 
lations. The most common bias in level 4 evidence studies 
was that they enrolled children already thought to be dehy¬ 
drated and to need intravenous fluids or who were admitted 
to the hospital. The diagnostic tests may perform better in 
children who are thought to be dehydrated compared with 
children solely at risk of dehydration. Thus, there may be 
limitations to the generalizability of these results when 
applied to an unselected group of children simply at risk of 
dehydration. 

The results of the study by Gorelick et al 35 differed from 
those of the other included studies. Gorelick et al 35 evaluated 
the interrater reliability for 10 physical examination signs. The 
K values ranged from 0.40 to 0.75, which were clearly better 
than those found in the other studies on precision by Saavedra 
et al 16 and Duggan et al. 36 The accuracy of signs was also gener¬ 
ally better in the study by Gorelick et al 35 than in other 
included studies. The LRs of positive tests were all statistically 
significant and ranged from 1.8 to 12. All 10 of the signs evalu¬ 
ated by Gorelick et al 35 were assessed in other studies. For 9 of 
the 10 signs, the results by Gorelick et al 35 produced the highest 
LRs of any included study, which is difficult to explain. The 
study by Gorelick et al 35 is of high methodologic quality in 
comparison with the other included studies. It achieved an 
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evidence quality level 3 according to nonconsecutive patient 
selection that did not introduce a clear systematic bias. They 
enrolled a relatively large group of patients and followed them 
meticulously. The sensitivity values of the tests were generally 
similar to those found in other studies, but the specificity was 
often much higher. The high percentage of true-negative test 
results may have been affected by a patient population with a 
relatively low incidence of disease in comparison with patients 
enrolled in the other studies. 35 

Ten of the 26 articles that met initial inclusion criteria were 
later found to have a methodologic flaw with the diagnostic 
standard and were excluded from the final analysis. These stud¬ 
ies used a gold standard for dehydration according to examina¬ 
tion signs or clinical assessment, which represents a circular flaw 
in assessing the utility of the history taking or examination in 
establishing dehydration. Conversely, the difference between an 
ill weight and a rehydrated weight (after illness) appears to be 
the best pragmatic diagnostic standard for dehydration that has 
been validated in the literature. 35 However, problems can be 
introduced by the timing of the rehydration weight. For exam¬ 
ple, if it is obtained too early, children may still be dehydrated or 
may actually be overhydrated because of aggressive intravenous 
fluid administration. The timing of the rehydration weight var¬ 
ied among the included studies, and most studies used addi¬ 
tional assessments to validate their perception of a true 
rehydration weight. For example, Teach et al 42 used the weight 
when the physical examination findings had normalized and the 
urine-specific gravity level was low. Incorporating other assess¬ 
ments that were not based on weight into the gold standard 
could bias the results. Some studies avoided this problem by 
documenting the rehydration weight when measured weight 
remained unchanged over time. 35 Another criticism of a weight- 
based gold standard is that infants may “gain” a significant per¬ 
centage of their body weight if they have a full bladder and 
colon, which they may then “lose” when they void. 20 In studies 
of large sample size, the weight contribution of a full bladder 
would be unlikely to have a major effect on the LRs for clinical 
findings. Additionally, the number of children with weight 
“gained” or “lost” because of impending or recent voids should 
balance. 

Pediatricians are taught that hypernatremia may alter the 
test characteristics of signs in dehydration. 30 For example, 
prolonged skin turgor is less sensitive in detecting significant 
dehydration in children with diabetes insipidus and pure 
water loss than in children with diarrhea. 15 Because of this 
clinical experience, some studies excluded children with sig¬ 
nificant hypernatremia. 35,38 Other studies used subgroup 
analysis to demonstrate that assessment had not been 
affected by hypernatremia. 37,43 Because tests of dehydration 
are usually applied without any knowledge of the serum 
sodium level in the patient, it seems appropriate to structure 
studies without excluding hypernatremic children. 

THE BOTTOM LINE 

Dehydration is an important cause of morbidity and mor¬ 
tality as a complication of pediatric illness. However, the 


literature evaluating the symptoms, signs, and laboratory 
values for assessing dehydration is limited. We found few 
high-quality studies with accurate gold standards and mini¬ 
mal systematic bias. 

The evidence shows that tests of dehydration are imprecise, 
generally showing only fair to moderate agreement among 
examiners. Historical points have moderate sensitivity as a 
screening test for dehydration. However, parental reports of 
dehydration symptoms are so nonspecific that they may not be 
clinically useful. The best 3 individual examination signs for 
assessing dehydration are prolonged capillary refill time, abnor¬ 
mal skin turgor, and abnormal respiratory pattern. Groups of 
signs or use of clinical scales improves diagnostic characteristics. 
Commonly obtained laboratory tests such as BUN and bicar¬ 
bonate concentrations generally are only helpful when results are 
markedly abnormal. A normal bicarbonate concentration helps 
somewhat to reduce the likelihood of dehydration. These labora¬ 
tory tests should not be considered definitive for dehydration. 

The literature reports more than 30 potential tests for 
detecting dehydration. This large number should not distract 
clinicians from focusing on signs and symptoms with proven 
diagnostic utility. Unfortunately, the data also suggest that 
signs of dehydration can be imprecise and inaccurate, making 
clinicians unable to predict the exact degree of dehydration. 
For this reason, we agree with the WHO and other groups that 
recommend using the physical examination to classify dehy¬ 
dration as “none,” “some,” or “severe.” 1,45 This general assess¬ 
ment can then be used to guide clinical management. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 The historical clues provided by the father are min¬ 
imally helpful in assessing the child’s dehydration. There are 
no signs present that increase the likelihood of dehydration. 
The negative LRs associated with the absence of multiple 
examination signs and the serum bicarbonate concentration 
of 19 mEq/L make significant dehydration much less likely. 
This child probably has “no” dehydration instead of “some” 
or “severe” dehydration. 

CASE 2 The hyperpnea, prolonged capillary refill time, and 
delayed skin turgor all increase the likelihood of dehydration. 
Because there are multiple signs of dehydration, the possibil¬ 
ity of severe dehydration should be considered and treated 
appropriately. 
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UPDATE: 


CLINICAL SCENARIO 


Worried parents bring a 3-year-old boy who is refusing to 
eat or drink to your office. His illness started 4 days ago, 
with temperatures as high as 39°C, increased sleepiness, 
and decreased oral intake. On examination, his tempera¬ 
ture is 38.7°C and he is alert but mildly tachycardic and 
tachypneic. He has normal skin turgor, although his 
mucous membranes are dry and his capillary refill is 3 sec¬ 
onds. Also observed on examination are small vesicular 
and ulcerated lesions on the posterior pharynx and red 
macules on the hands and feet. 

UPDATED SUMMARY ON DEHYDRATION IN CHILDREN 

Original Review 

Steiner MJ, DeWalt DA, Byerley JS. Is this child dehydrated? 
JAMA. 2004;291(22):2746-2754. 

UPDATED LITERATURE SEARCH 

We repeated the literature search from April 2004 to March 2006 
and found 258 new abstracts, but there were no additional stud¬ 
ies of the diagnostic accuracy (both sensitivity and specificity) of 
the physical examination components or an explicit grouping of 
findings for predicting the presence of dehydration in young 
children. One potential article that assessed the validity and reli¬ 
ability of dehydration assessment in children with diabetic 
ketoacidosis was excluded because it enrolled only 5 children 
who met the age range criteria of our original search (1 month 
to 5 years). 1 We identified one article on the precision of individ¬ 
ual findings and a second article that contained information on 
the accuracy and precision of the findings included in a new 
childhood dehydration scale. 2,3 

NEW FINDINGS 

Details of the update 

Friedman et al 3 evaluated the measurement properties of 12 
findings for dehydration, each measured on a 3-point ordinal 
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Table 25-6 Rating Scale Based on Severity of 4 Clinical Signs 3 

Score 

Characteristic tlntraclass 

Correlation Coefficient) 

0 

1 

2 

General appearance 
(0.55) 

Normal 

Thirsty, restless, 
or lethargic but 
irritable when 
touched 

Drowsy, limp, cold, 
sweaty, or comatose 

Eyes (0.61) 

Normal 

Slightly sunken 

Very sunken 

Mucous membranes, 
moistness on the tongue 
(0.71) 

Moist 

“Sticky” 

Dry 

Tears (0.66) 

Tears 

Decreased tears 

Absent tears 


scale. Nine items occurred frequently enough to merit closer 
evaluation of differing combinations. The patients were aged 
1 to 36 months, with gastroenteritis and clinically diagnosed 
dehydration. A 4-item scale ( Table 25-6) had the best mea¬ 
surement characteristics, as assessed by correlation with 
change in weight, interobserver variability, discrimination 
between levels of dehydration, and change after treatment. 

The intraclass correlation coefficients (a measure of inter¬ 
observer variability for items on an ordinal scale) were 
comparable to the range of K values reported in the original 
study. Results of the 4 findings are summed, and if one or 
more are abnormal, the authors report a sensitivity of 0.85 
(95% confidence interval [Cl], 0.73-0.97) and a specificity 
of 0.32 (95% Cl, 0.20-0.44) for dehydration at a cut point of 
greater than or equal to 3% (according to data from the 
original research; written communication, 3 Patricia Parkin, 
MD, University of Toronto, Canada, April 2006). The sensi¬ 
tivity of this model was similar to that reported by Gorelick 
et al 4 for detecting 5% dehydration (sensitivity, 0.79; speci¬ 
ficity, 0.87), with much lower specificity, but it is difficult to 
compare the scales directly because of differing dehydration 
cutoff levels. The model presented by Gorelick et al 4 was 
considered positive for 5% dehydration if any 2 of the fol¬ 
lowing were present: capillary refill greater than 2 seconds, 
dry mucous membranes, absent tears, or change in general 
appearance. 

The interobserver variability for signs of shock was evalu¬ 
ated in Kenyan children admitted to a pediatric ward. 2 The 
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diagnoses were unknown to the 4 independent examining 
clinicians. Capillary refill time, dry mucous membranes, 
decreased skin turgor, and sunken eyes each had K values well 
within the ranges reported in the original Rational Clinical 
Examination article. 


IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

We rereviewed all of the original studies to establish the pre¬ 
test probability of dehydration in children. Only 2 studies 
assessed the prevalence of dehydration (5%) among children 
presenting for emergency care with diarrhea, vomiting, or 
poor oral intake. There was heterogeneity in the prevalence 
(11/71 vs 63/186), 4,5 but the random-effects summary preva¬ 
lence provides a useful anchor for infants and children pre¬ 
senting with these symptoms (summary prevalence, 25%; 
95% Cl, 14%-39%). 

We rereviewed the data from Gorelick et al 4 on the perfor¬ 
mance of a combination of findings for dehydration. This 
study had the largest number of children of any high-quality 
study in our original review. We reported a likelihood ratio 
(LR) of 6.1 (95% Cl, 3.8-9.7) to predict at least 5% dehydra¬ 
tion when 2 of 4 signs of dehydration were present. However, 
we did not provide the LR associated with fewer findings. 
The presence of 0 to 1 finding has an LR of 0.24 (95% Cl, 
0.14-0.39), making dehydration less likely. By changing the 
threshold to greater than or equal to 3 findings, the model 
predicts more severe dehydration (>10%), with an LR of 4.7 
(95% Cl, 3.1-7.3). 

DIFFERENCES IN THE REFERENCE STANDARD 

There have been no changes in the reference standards for 
dehydration. 


RESULTS OF LITERATURE REVIEW 

Univariate Findings 

There were no new data on the accuracy of individual symp¬ 
toms and signs of dehydration at a threshold of 5%. When 
measured on an ordinal scale, the intraclass correlations as 
measures of reliability are good for general appearance 
(0.55), presence of dry mucous membranes (0.71), sunken 
eyes (0.61), tears (0.66), and capillary refill time (0.65). 6 

EVIDENCE FROM GUIDELINES 

There have been no updates to the 2003 guideline published 
by the Centers for Disease Control and Prevention (CDC). 6 
Since the publication of the original article, the American 
Academy of Pediatrics has retired their clinical practice 
parameters for the management of acute gastroenteritis 7 and 
endorsed the CDC guideline. 


CLINICAL SCENARIO—RESOLUTION 


The clinical history of this 3-year-old boy puts him at risk 
for dehydration. It is difficult to establish a pretest proba¬ 
bility of dehydration for this child presenting to a clini¬ 
cian’s office. However, according to published studies 
from emergency departments where children were enrolled 
solely because of potentially dehydrating symptoms, his 
pretest probability of dehydration can be estimated at 
25%. 4,5 According to our reviews, his prolonged capillary 
refill and tachypnea independently make dehydration 
more likely. Although some other clinical signs are nor¬ 
mal, his dry mucous membranes, with a prolonged capil¬ 
lary refill time, give him a positive result on the Gorelick 
clinical scale (LR 6.1; 95% Cl, 3.8-9.7). According to these 
values, the posttest probability of dehydration is 70%, so 
appropriate treatment should be initiated. 
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CHILDHOOD DEHYDRATION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Dehydration develops progressively, depending on the 
underlying condition, and therefore, a consistent prior 
probability of dehydration cannot be established for most 
general conditions. For infants and children whose parents 
bring them for emergency care for diarrhea, vomiting, or 
poor oral intake, the prevalence of at least 5% dehydration is 
approximately 25% (95% Cl, 14%-39%). 

POPULATION FOR WHOM CHILDHOOD 
DEHYDRATION SHOULD BE CONSIDERED 

In our initial article, we were unable to identify published 
parental historical elements that made dehydration more 
likely. However, vomiting, diarrhea, change in oral intake, 
decreased urine output, fever, change in mental status, or the 
presence of potentially dehydrating underlying conditions (eg, 
diabetes insipidus) prompts an evaluation for dehydration. 6 

DETECTING THE LIKELIHOOD OF 
CHILDHOOD DEHYDRATION 

Accurately identifying the presence of dehydration requires 
the use of combinations of signs. Combinations of findings 
can include results being either present or absent or graded 
on an ordinal scale (eg, 0, 1, 2) and then summed across 
findings. Each scale must be assessed in comparison with the 
reference standard. See Table 25-7. 


Table 25-7 Likelihood Ratio of Combinations of Findings for 

Greater Than or Equal to 5% Dehydration 



> 5% Dehydration 

Findings 3 

Positive 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Capillary refill time > 2 s, 
dry mucous membranes, 
absent tears, altered 
general appearance 

>2 Findings 3 

6.1 

(3.8-9.7) 

0.24 

(0.14-0.39) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“When 3 to 4 findings are present, the likelihood of severe dehydration (>10%) is 4.7 
(95% Cl, 3.1-7.3). 

REFERENCE STANDARD TESTS 

The difference between the “well” weight and the acute weight 
divided by the well weight represents the standard for the per¬ 
centage of volume lost because of dehydration. 
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EVIDENCE TO SUPPORT THE UPDATE 


Hypovolemia, Child 



TITLE Development of a Clinical Dehydration Scale for 
Use in Children Aged Between 1 and 36 Months. 

AUTHORS Friedman JN, Goldman RD, Srivastava R, 
Parkin PC. 

CITATION JPediatr. 2004;145(2):201-207. 

QUESTION Can a clinical dehydration scale accurately 
and reliably distinguish between degrees of dehydration 
and help to assess the response to therapy? 

DESIGN A prospective study enrolled a convenience 
sample of children and assessed dehydration signs before 
and after rehydration. 

SETTING Participants were enrolled through the emer¬ 
gency department of a tertiary-care pediatric hospital. 

PATIENTS Children aged 1 to 36 months and presenting 
to the hospital for treatment of presumed viral gastroenteri¬ 
tis were enrolled. All participants were judged by the attend¬ 
ing physician to be dehydrated and to need rehydration 
therapy (either oral or intravenous). Exclusion criteria were 
another cause of dehydration, the presence of any chronic 
disease, recent intravenous fluid therapy, or important 
serum sodium alterations (<130 or >150 mmol/L). 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Twelve clinical tests for dehydration were initially identified 
by a review of the published literature and survey of experts 
in the field. Reported urine output, general appearance, cap¬ 
illary refill, skin turgor, sunkenness of eyes, mucous mem¬ 
branes, tears, respiratory rate, and heart rate were endorsed 
frequently enough to be included for further analysis of test 
characteristics. Signs with the strongest measurement prop¬ 
erties on univariate analyses were then combined to form the 
clinical scale. 

The diagnostic standard for initial percentage of dehydra¬ 
tion was calculated with the following equation: (rehydration 
weight - the dehydrated weight) x 100/rehydration weight. 


MAIN OUTCOME MEASURES 

The rating scale possessed the strongest measurement prop¬ 
erties ( Table 25-8 and Table 25-9). 

In addition, the reliability of this scale was assessed 
between examiners. The intraclass correlation coefficient was 
0.77, demonstrating a high level of agreement. 


Table 25-8 Rating Scale Based on Severity of 4 Clinical Signs 

Clinical Signs 

0 

1 

2 

General appearance 

Normal 

Thirsty, restless, 
lethargic but irrita¬ 
ble when touched 

Drowsy, limp, cold, 
comatose 3 

Eye appearance 

Normal 

Slightly sunken 

Very sunken 

Mucous membranes 

Moist 

“Sticky” 

Dry 

Tear presence 

Tears 

Decreased 

Absent tears 

“Children who are comatose automatically fall into this category. 


Table 25-9 Likelihood Ratio for Rating Scale at a Threshold of > 1 for 
Dehydration of at Least 3% 


> 3% Dehydration 

LR+ LR- 

Findings Result Sensitivity Specificity 3 (95% Cl) (95% Cl) 

General >1 0.85 0.32 1.3 0.46 

appearance, Abnormal (1.0-1.6) (0.19-1.1) 

sunken eyes, 
dry tongue, 
decreased tears 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Patricia Parkin, MD, provided the specificity data from the results of her original 
research (written communication, April 2006). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS This study used established methodology for 
the development of outcome measures and applied them to a 
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common clinical concern in the care of children. The scale is 
easy to use and evaluate in clinical settings. 

LIMITATIONS The use of an immediate rehydration weight 
instead of a true well weight to determine the exact degree of 
dehydration is the most important limitation of this study. 
Determining the percentage of dehydration in this manner 
has not been previously validated. 

This meticulously conducted study illustrates how ineffec¬ 
tive clinicians are at accurately identifying dehydration. All 
subjects were thought to be dehydrated by pediatric emer¬ 
gency department specialists, yet 16% of the subjects had no 
dehydration and 49% had clinically insignificant (<3%) 
degrees of dehydration when it was retrospectively measured 
with the diagnostic standard. Unfortunately, the test character¬ 
istics demonstrated by their clinical scale do not further assist 
clinicians with the accurate identification of dehydration. 

The authors hoped to establish a clinical dehydration scale 
whose purpose was to discriminate between degrees of dehy¬ 


dration, though recent practice guidelines recommend 
grouping children into “none,” “some,” or “severe” dehydra¬ 
tion and basing treatment accordingly instead of on esti¬ 
mates of percentile-based groupings. 1,2 The responsiveness to 
change of their clinical scale suggests a potentially important 
clinical use; normalization of the scale may signal that a child 
is rehydrated and can safely stop therapy. However, this 
potential needs to be confirmed in future clinical trials. 

Reviewed by Michael J. Steiner, MD 
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1. King CK, Glass R, Bresee JS, Duggan C. Managing acute gastroenteritis 
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2. World Health Organization. The Treatment of Diarrhoea: A Manual for Physi¬ 
cians and Other Senior Health Workers. Geneva, Switzerland: WHO; 2003. 
http://libdoc.who.int/hq/2003AVHO_FCH_CAH_03.7.pdf. Accessed June 
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Influenza? 

Stephanie A. Call, MD, MSPH 
Mark A. Vollenweider, MD, MPH 
Carlton A. Hornung, PhD, MPH 
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A 45-year-old eighth-grade math teacher visited your 
office in mid-December 2003, complaining of tempera¬ 
ture to 38.6°C (101.5°F), dry cough, sore throat, myal¬ 
gias, and malaise. Her symptoms began approximately 
24 hours earlier, but she continued to teach through the 
end of the school day. A number of children in her 
classes were absent because of similar complaints during 
the past 2 weeks. Her physical examination results 
revealed readily apparent malaise, temperature of 38.5°C 
(101°F), mild pharyngeal erythema with no exudates, no 
adenopathy, and clear lung fields. She took acetamino¬ 
phen and ibuprofen for fever and muscle aches, with 
modest relief. Her medical history was notable for hyper¬ 
tension and gastroesophageal reflux disease, for which 
she took hydrochlorothiazide and lansoprazole, respec¬ 
tively. Aside from 2 normal deliveries more than 10 years 
previously and an appendectomy during childhood, she 
had never been hospitalized. As in previous years, she 
chose not to receive influenza vaccine. She came to you 
suspecting that she might have the flu and asking 
whether any medication would help her return to the 
classroom more quickly. 


WHY IS THIS AN IMPORTANT CLINICAL ISSUE? 


Ten percent to 20% of US residents contract influenza annu¬ 
ally, accounting for an average of 36000 deaths throughout 
the past decade 1 and 133900 pneumonia and influenza hos¬ 
pitalizations per year from 1979 to 2001. 2 Given its propen¬ 
sity for antigenic drifts and shifts, influenza has the capability 
to cause periodic epidemics and global pandemics. A short¬ 
fall in production of vaccine because of problems at one 
manufacturer’s facilities (http://www.hhs.gov/news/press/2004 
pres/20041005.html; accessed March 28, 2008) created the 
potential for increased morbidity and mortality in the 2004- 
2005 influenza season. The effect on society during major 
outbreaks is substantial in terms of both direct medical costs 
and indirect costs associated with illness, including missed 
workdays and reduced productivity. 3 In 2003, there were 
concerns about early season reports of influenza-related 
severe illnesses and deaths in the United States. 4 The fixed 
number of doses of vaccine (approximately 83 million) and 
the increased demand for its use in 2003 led to a redistribu¬ 
tion of vaccine to clinicians caring for individuals with the 
greatest immediate need. 4 This situation was compounded 
by a vaccine that may have had reduced effectiveness 
because of a suboptimal antigenic match. Early in the 2004- 
2005 season, one of the manufacturers of the trivalent inac¬ 
tivated vaccine did not provide vaccine to the United States; 
consequently, the available vaccine for the nation was only 
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about half that projected for the year. 5 Under these circum¬ 
stances, early diagnosis and intervention were even more 
critical. 

Two agents, zanamivir and oseltamivir (for either type A 
or type B strains), are currently recommended and reduce 
the duration of clinical illness, 6 but they are expensive and 
must be instituted within 48 hours of symptom onset for 
maximal benefit. Consequently, they should be used only 
when the probability of infection with influenza and the 
expected benefit are both high. 

Influenza-like illness, defined by the Centers for Disease 
Control and Prevention (CDC) US Influenza Sentinel 
Providers Surveillance Network as temperature higher 
than 37.8°C (100°F) plus either cough or sore throat (http:// 
www.cdc.gov/flu/weekly/; accessed June 1, 2008) but some¬ 
times defined differently by others, is a syndrome charac¬ 
terized by other nonspecific symptoms that may be observed 
with a variety of upper respiratory tract infections. The 
frequency of infections attributable to the various viral 
agents that cause influenza-like illness varies geographi¬ 
cally and from week to week throughout the influenza sea¬ 
son. Fortunately, excellent weekly reports are available 
that help clinicians understand both the incidence of 
influenza-like illness and the current influenza activity 
rates applicable to their geographic locations. The CDC 
produces weekly influenza reports that are available online 
(http://www.cdc.gov/flu/weekly/fluactivity.htm; accessed 
June 1, 2008). These reports provide a synopsis of epide¬ 
miologic information, including laboratory surveillance 
data, influenza-like illness frequency as reported by US 
sentinel providers, and regional variability of outbreaks 
(Figure 26-1). Similar reports are available from individ¬ 
ual state health departments, Canada (through Health 
Canada), the World Health Organization (WHO) Interna¬ 
tional Influenza Program, the WHO Flunet, and the Euro¬ 
pean Influenza Surveillance Scheme (hyperlinks available 
at http://www.cdc.gov/flu/weekly/intsurv.htm; accessed June 
1,2008). 

In the 2003-2004 influenza season, the weekly percentage 
of patient visits for influenza-like illness exceeded the 
national baseline of 2.5% for 9 consecutive weeks, with a 
peak of 7.6% in the week ending December 27, 2003. 4 Thus, 
during the peak week of the 2003-2004 outbreak, about 1 of 
every 13 primary care visits in the United States was for an 
influenza-like illness. 

Laboratory surveillance monitoring in the United States 
showed that most samples in the 2003-2004 influenza sea¬ 
son tested negative for influenza. Although not specifically 
reported, the implication is that these patients often had 
other viruses such as rhinoviruses, adenoviruses, and 
parainfluenza. Although many of these are relatively benign 
and self-limited, others may be serious; for example, early 
infection during an epidemic of the coronavirus causing 
severe acute respiratory syndrome (SARS) produced influ- 
enza-like illness. 7 Bacterial agents, including Legionella spe¬ 
cies, Chlamydia pneumoniae, Mycoplasma pneumoniae, and 
Streptococcus pneumoniae, may also be responsible for 
influenza-like illnesses. 


When faced with a patient with influenza-like illness, a 
physician must be able to accurately estimate the probability 
of influenza as opposed to other infections. This probability 
estimate guides the clinician in further diagnostic testing and 
treatment. Appropriate and prompt diagnosis and therapy 
affect not only the individual patient but society as well, in 
that local outbreaks may be detected and control measures 
initiated. Influenza is difficult to diagnose because of non¬ 
specific symptoms and the host of other diseases that cause 
similar symptoms. Our objective in this review was to iden¬ 
tify clinical factors that may be valuable in distinguishing 
which patients with influenza-like illness have a higher prob¬ 
ability of truly having influenza. 

METHODS 

Search Strategy and Quality Review 

We searched MEDLINE (January 1966 to September 2004) 
to identify articles pertaining to the diagnosis of influenza 
according to individual clinical signs and symptoms. We 
intentionally limited the search to the period before the 
SARS epidemic to avoid implying that the same operating 
characteristics could be applied during an outbreak with a 
highly virulent agent causing similar symptoms. The search 
strategy used the following Medical Subject Headings: “EXP 
influenza” or “EXP influenza A virus” or “EXP influenza A 
virus human” or “EXP influenza B virus.” These terms were 
then combined with the Medical Subject Headings and text 
words “EXP sensitivity and specificity” or “EXP medical 
history taking” or “EXP physical examination” or “EXP 
reproducibility of results” or “EXP observer variation” or 
“symptoms.mp” or “clinical signs.mp” or “sensitivity.mp” 
or “specihcity.mp.” We also searched for academic reviews 
on influenza (“EXP influenza” or “EXP influenza A virus” 
or “EXP influenza B virus,” limited to human, English-language 
academic reviews). From this search, we retained only sys¬ 
tematic reviews. We reviewed the references and citations to 
identify other relevant articles. We also reviewed the refer¬ 
ences in a recent systematic review by Ebell et al. s Unpub¬ 
lished primary data were not sought. 

Abstracts of the identified articles were reviewed for rele¬ 
vance. Only articles describing primary studies dealing with 
the diagnosis of influenza according to clinical signs and 
symptoms were selected for complete review. 

Two of the authors (S.A.C., W.P.M.) independently 
reviewed the final set of 17 articles for quality. 9 ' 25 Differ¬ 
ences in assessment were discussed and resolved by con¬ 
sensus. Studies in the final set were excluded from analysis 
if they did not meet the following criteria: (1) study 
design qualifying as prospective cohort, randomized con¬ 
trolled trial, or meta-analysis; (2) inclusion of primary 
assessment of individual clinical signs and symptoms as 
predictors of diagnosis; (3) definition of at least 1 of the out¬ 
comes as influenza type A or B infection that was proven by 
(a) culture, (b) 4-fold increase in diagnostic antibody titer, 
eg, hemagglutination inhibition, complement fixation, or 
enzyme immunoassay from acute to convalescent serum, 
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[A] Laboratory Surveillance 


WHO/NREVSS Collaborating Laboratories 
National Summary, 2003-2004 



Influenza-Like Illness (ILI) Surveillance 


Percentage of Visits for Influenza-like Illness 
Reported by Sentinel Providers 
National Summary, 2003-2004 



) ILI 2003-2004 


6IL I 1999-2000* - 


— %ILI 2002-2003-national baseline 


* The 1999-2000 season was selected for comparison because it was the most recent A(H3N2) season of moderate severity. 


[C~| Weekly Influenza Activity Estimates Reported by State and Territorial Epidemiologists 


Weekly Influenza Activity Estimates Reported 
by State and Territorial Epidemiologists 
Week ending December 6,2003—Week 49 



No Report 


No Activity 


Sporadic Local Activity 


Regional 



Figure 26-1 Centers for Disease Control and 
Prevention Weekly Report: Influenza Sum¬ 
mary Update, Week Ending December 6, 
2003—Week 49 

Source: http://www.cdc.gov/flu/weekly/ 
weeklyarchives2003-2004/weekly49.htm; accessed 
June 1,2008. 


(c) polymerase chain reaction, or (d) immunofluorescent 
antibody; and (4) study quality graded A or B using the 
scheme appearing previously in The Rational Clinical 
Examination series, adapted from Holleman and Simel 26 
as shown (see Table 1-7 for a summary of Evidence Grades 
and Levels). 


Grade A: Independent blinded comparison of signs or 
symptoms with criterion standard among a large number 
of consecutive patients (>300) who might have influenza. 

Grade B: Independent blinded comparison of signs or symp¬ 
toms with criterion standard among a small number of 
consecutive patients (<300) who might have influenza. 
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Grade Cl: Independent blinded comparison of signs or 
symptoms with criterion standard in nonconsecutive pa¬ 
tients or nonindependent comparison in patients known 
to have influenza. 

Grade C2: Comparison of signs or symptoms with standard 
of uncertain validity. 

Ten articles met all of the inclusion criteria. 9 ' 12 ’ 14 - 17 ' 20 ’ 23 ' 25 

Because the interpretation of rapid influenza test results 
is tightly coupled to the interpretation of the clinical 
examination, we added information to the article about 
the usefulness of diagnostic testing. This information was 
obtained through an additional MEDLINE database 
search (January 1996 to October 2004) for English-language 
articles pertaining to rapid diagnostic kits for human influ¬ 
enza. This strategy was devised to focus on articles describ¬ 
ing the most current and relevant tests available to clinicians 
and to find citations in which direct comparisons of the 
most recent tests might be available. The search strategy 
used the following medical subject headings: “EXP influ¬ 
enza” and “EXP sensitivity and specificity” and “EXP 
reagent kits, diagnostic.” Data from manufacturers were 
also sought to establish the products’ range of sensitivity 
and specificity. Unpublished primary data were not 
sought. Abstracts of identified articles were reviewed for 
relevance. 

Statistical Methods 

We used data from the identified articles to calculate the sen¬ 
sitivity, specificity, positive likelihood ratio (LR), and nega¬ 
tive LR, as well as a summary LR and the diagnostic odds 
ratio (OR) for individual medical history and physical exam¬ 
ination findings. The positive LR is a measure of how 
strongly a positive result increases the odds of disease; the 
negative LR is a measure of how well a negative result 
decreases the odds of disease. An LR greater than 1.0 
increases the likelihood of disease; an LR less than 1.0 
decreases the likelihood; an LR close to 1.0 does not change 
the likelihood. FastPro (Academic Press, Boston, Massachu¬ 
setts) was used for all analyses; P < .05 was used to determine 
statistical significance. 

The Diagnostic OR 

The diagnostic OR is a single indicator of diagnostic test 
performance, reflecting its accuracy. 27 The diagnostic OR 
can also be viewed as presenting the odds (likelihood) of 
the symptom or finding among individuals with disease (ie, 
the positive LR) compared with the odds of the symptom or 
finding among those not having the disease (ie, the negative 
LR). The diagnostic OR should always be assessed in com¬ 
parison with the paired sensitivity and specificity because 
the same diagnostic OR can be associated with different 
pairs. The value of the diagnostic OR ranges from 0 to 
infinity, with higher values indicating better test perfor¬ 
mance. Values less than 1 indicate more negative test results 
among individuals with disease. The diagnostic OR can also 
be used to develop summary estimates in meta-analyses. 


We tested the LRs for heterogeneity between studies using 
the Mantel-Haenszel Q-statistic. 28 We used conservative 
random-effects models to describe the summary estimates 
and confidence intervals (CIs), making it easier to discern the 
relative usefulness of symptoms and signs. 29,30 

RESULTS 

The search strategy identified 915 articles (bibliography avail¬ 
able on request). We found only 10 studies that met all the inclu¬ 
sion criteria. 9 ’ 12 ’ 14 ’ 17 ' 20 ’ 23 ' 25 Most of the excluded articles were not 
primary studies. We were unable to obtain primary data for 3 of 
the 10 studies, 9,18,19 and data from 1 study were included in 2 arti¬ 
cles; thus, the final data ( Table 26-1) are based on 6 studies and 
included 7105 patients. 12 ’ 14,20 ’ 23,25 ’ 26 We identified a recent system¬ 
atic review that included several studies for which we were 
unable to obtain the primary data. 8 Thus, not all the references 
in this systematic review met our inclusion criteria. One addi¬ 
tional study included in this review, but not identified in our lit¬ 
erature search, did meet our inclusion criteria. 25 

The second search strategy identified 13 articles dealing 
with rapid diagnostic tests for influenza (bibliography avail¬ 
able on request). Only 6 original articles 31 ' 36 describing the 
comparison of a commercially available rapid diagnostic test 
for influenza vs viral culture as the criterion standard were 
selected for complete review. Of these, only 1 article 35 pre¬ 
sented direct comparison of results among 4 test kits studied; 
the data from this article were evaluated in detail. 

Precision of Signs and Symptoms 

None of the studies assessed the precision of signs or symp¬ 
toms of influenza. Measurements of objective clinical signs 
such as temperature are assumed to have high precision. 

Accuracy of Signs and Symptoms 

The studies presented used varying definitions for fever, ranging 
from 37.8°C to 38.5°C (Table 26-1). We defined fever as present 
or absent according to the individual article’s definition. Fever¬ 
ishness was reported by the patient and could have been based 
on either a temperature taken at home or a subjective sense of 
having an elevated temperature. The sensitivity, specificity, posi¬ 
tive LR, negative LR, and diagnostic OR for clinical variables 
evaluated in at least 2 of the 5 studies are reported in Tables 26-2 
and 26-3. Summary estimates are also presented. Eleven of the 
13 clinical factors had heterogeneous diagnostic ORs (all with 
P < .05). The patient’s sense of feverishness and vaccination his¬ 
tory provided homogeneous results across studies. Despite the 
heterogeneity, the studies we reviewed seem representative of 
the universe of patients with influenza, and most of the differ¬ 
ences in estimates created by the statistical heterogeneity were 
small. The heterogeneity, expressed in the CIs, never moved a 
finding from useless (LR approaching 1) to obviously useful (LR 
so different from 1 that it would make influenza extremely likely 
or extremely unlikely). Therefore, we present the summary LRs 
and diagnostic ORs as an efficient way of conveying the relative 
diagnostic effect of the symptoms and signs. 
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Table 26-1 

Studies of the Diagnostic Performance of Clinical Findings in Diagnosing Influenza 




Source, y 

Study Period 

Location 

No. of 
Patients 

Age 

Range, y 

Design 

Selection Criteria 

Grade of 
Evidence 3 

Diagnostic Test 

Prevalence 
of Influenza, 

% 

Nicholson 
et al, 25 

1997 

Winters of 
1992-1993 
and 1993- 
1994 

Leicester¬ 
shire, England 

533 

60-90 

Prospective 

cohort 

Weekly telephone surveillance 
for symptoms of upper respi¬ 
ratory tract infection; home 
visit as soon as possible there¬ 
after if symptoms noted 

A 

4-Fold increase in 
hemagglutination 
inhibition titer 

8 

Govaert et 
al, 14 1998 

Influenza 

season, 

1991-1992 

The Nether¬ 
lands 

1838 

>60 

Randomized 
controlled trial 
(of influenza 
vaccine) 

Tested all persons in the study. 
Persons were originally 
selected from general practice 
offices, not “high-risk” groups 

B 

4-Fold increase in 
titer (influenza A) 

7 

Carrat et 
al, 12 1999 

Influenza epi¬ 
demic, 1995- 
1996 

France 

610 

Included 
all ages 

>i y 

Prospective 

cohort 

Sudden onset of > 1 of the fol¬ 
lowing: influenza-like illness, 
upper or lower respiratory tract 
infectious syndrome, or tem¬ 
perature of > 38°C (100.4°F) 
without any symptoms or 
signs of other infectious syn¬ 
dromes 

A 

ELISA, immuno¬ 
fluorescence 
(influenza A) 

28 

Monto et 
al, 20 2000 

Fall and win¬ 
ters, 1994- 
1998 

231 Study 
centers in 

North Amer¬ 
ica, Europe, 
southern 
hemisphere 

3744 

>12 

Retrospective, 
pooled analy¬ 
sis of clinical 
trials 

Fever or > 2 symptoms (head¬ 
ache, myalgias, cough, sore 
throat) 

B 

Positive culture 
result for influenza 

A or B or 4-fold 
increase in titer or 
PCR or immuno¬ 
fluorescence 

66 

Hulson et 
al, 17 2001 

3 Consecu¬ 
tive influenza 
outbreaks, 
1999-2000 

Oklahoma 

358 

10 mo- 
73 y 

Prospective 

cohort 

Any of: fever (temperature 
> 38°C [100.4°F]), cough, 
sore throat, headache, myalgia 

A 

Positive culture 
result for influ¬ 
enza A or B 

67 

van Elden 
et al, 23 

2001 

Influenza 

season, 

1997-1998 

The Nether¬ 
lands 

81 

Included 
all ages 

Prospective 

cohort 

Fever (temperature > 38°C 
[100.4°F]) plus > 1 constitu¬ 
tional symptom (malaise, 
headache, myalgia, chills) plus 
> 1 respiratory symptom 
(coryza, sneezing, cough, sore 
throat, hoarseness) 

B 

PCR (influenza A) 

5 


Abbreviations: ELISA, enzyme-linked immunosorbent assay; PCR, polymerase chain reaction. 
“See Table 1 -7 for a summary of Evidence Grades and Levels. 


No single clinical finding consistently had a positive LR high 
enough to clinically rule in influenza nor did any single finding 
have a negative LR low enough to clinically rule out influenza 
(Tables 26-2 and 26-3). However, several patterns do emerge 
when the data are evaluated from the multiple studies. Among 
studies that enrolled patients without regard to age, no single 
finding had a summary LR greater than 2. For decreasing the 
likelihood of influenza, the absence of fever (LR, 0.40; 95% Cl, 
0.25-0.66), cough (LR, 0.42; 95% Cl, 0.31-0.57), or nasal con¬ 
gestion (LR, 0.49; 95% Cl, 0.42-0.59) was the only finding 
with an LR less than 0.5. Feverishness, myalgia, malaise, sore 
throat, and sneezing each had a positive and negative LR that 
was indistinguishable from 1.0 and therefore of no diagnostic 
value for the patients in studies that evaluated the entire age 
spectrum. Among the studies of patients limited to those aged 
60 years or older, the strongest univariate indicators of influ¬ 
enza were fever (LR, 3.8; 95% Cl, 2.8-5.0), malaise (LR, 2.6; 
95% Cl, 2.2-3.1), and chills (LR, 2.6; 95% Cl, 2.0-3.2). Among 


older patients exclusively, the presence of sneezing reduced the 
likelihood of influenza (LR, 0.47; 95% Cl, 0.24-0.92). 

Two studies, by Govaert et al 14 and Monto et al, 20 assessed 
the diagnostic usefulness of fever with cough in persons aged 
60 years or older and in the unrestricted age group (Table 26-3). 
The LRs when both fever and cough were present were 5.0 
and 1.9, respectively. The addition of a third variable, acute 
onset of symptoms, added minimally to the discriminatory 
accuracy in either study. 

The calculation of diagnostic ORs for the individual variables 
in each study allows us to compare the diagnostic performance 
of the different variables and combinations of variables using a 
single measure (Tables 26-2 and 26-3). The 3 studies with the 
lowest frequency of influenza tended to have the best overall 
accuracy as expressed by the diagnostic OR. In comparison with 
the calculated diagnostic ORs for other symptoms, fever (sum¬ 
mary diagnostic OR, 4.5; 95% Cl, 1.8-11) and cough (summary 
diagnostic OR, 2.8; 95% Cl, 2.1-3.7) are the most useful single 
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Table 26-2 Test Characteristics of Clinical Findings, by Study 





Source, y 

Sensitivity 

Specificity 

LR+ (95% Cl) a 

LR- (95% Cl) a 

DOR (95% Cl) a 


Fever 

No age restriction 

Carrat et al, 12 1999 

0.84 

0.73 

3,1 (2.6-3.7) 

0.21 (0.15-0.31) 

14(8.8-23) 


Monto et al, 20 2000 

0.68 

0.60 

17(1.6-1.8) 

0.53 (0.49-0.57) 

3.2 (2.8-37) 


Hulson et al, 17 2001 

0.86 

0.25 

1.1 (1.0-1.3) 

0.59 (0.35-0.87) 

1.9 (1.0-3.4) 


Summary 



1.8 (1.1-2.9) 

0.40 (0.25-0.66) 

4.5 (1.8-11) 


Only patients > 60 y 

Govaert et al, 14 1998 

0.34 

0.91 

3.8 (2.8-5.0) 

0.72 (0.64-0.82) 

5.2 (3.4-7.9) 


Feverishness 

No age restriction 

Monto et al, 20 2000 





1.1 (0.89-1.4) 


van Elden et al, 23 2001 

0.88 

0.15 

1.0 (0.86-1.2) 

0.70 (0.27-2.5) 

1.3(0.35-4.6) 


Summary 





1.1 (0.88-1.4) 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.47 

0.78 

2.1 (1.2-3.7) 

0.68 (0.45-1.0) 

3.1 (1.2-84) 


Cough 

No age restriction 

Carrat et al, 12 1999 

0.84 

0.29 

1,2 (1.1-1.3) 

0.58 (0.39-0.85) 

2.0 (1.3-3.2) 


Monto et al, 20 2000 

0.93 

0.20 

1.2 (1.1-1.2) 

0.35 (0.29-0.42) 

3.3(27-44) 


Hulson et al, 17 2001 

0.96 

0.07 

1.0(0.95-1.1) 

0.61 (0.25-1.5) 

1.9(0.71-5.0) 


van Elden et al, 23 2001 

0.98 

0.23 

1.3 (1.1-1.5) 

0.11 (0.01-0.82) 

12(1.4-97) 


Summary 



1.1 (1.1-1.2) 

0.42 (0.31-0.57) 

2.8 (2.1 -3.7) b 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.53 

0.56 

1.2 (0.75-1.9) 

0.85 (0.52-1.4) 

1.4(0.5-37) 


Govaert et al, 14 1998 

0.66 

0.77 

2.9 (2.5-3.4) 

0.44 (0.34-0.56) 

6.7(4.5-10) 


Summary 



2.0 (1.1-3.5) 

0.57 (0.37-0.87) 

3.4 (1.2-97) 


Myalgia 

No age restriction 

Monto et al, 20 2000 

0.94 

0.06 

1.0(0.98-1.0) 

1.0(0.76-1.3) 

0.99 (0.75-1.3) 


Hulson et al, 17 2001 

0.64 

0.21 

0.81 (0.71-0.93) 

1.7 (1.2-2.5) 

0.50 (0.29-0.83) 


van Elden et al, 23 2001 

0.60 

0.38 

0.97 (0.68-1.4) 

1.0(0.60-1.8) 

0.94 (0.38-2.3) 


Summary 



0.93 (0.83-1.0) 

1.2 (0.90-1.6) 

0.79 (0.54-1.1) 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.47 

0.83 

27(1.5-5.0) 

0.64(0.41-0.98) 

4.3(1.6-12) 


Govaert et al, 14 1998 

0.45 

0.81 

2.4(1.9-3.0) 

0.68 (0.58-0.80) 

3.4 (2.3-5.0) 


Summary 



2.4 (1.9-2.9) 

0.68 (0.58-0.79) 

3.5 (2.4-5.0) 


Malaise 

No age restriction 

van Elden et al, 23 2001 

0.73 

0.26 

0.98(0.75-1.3) 

1.1 (0.51-2.2) 

0.91 (0.34-2.5) 


Only patients > 60 y 

Govaert et al, 14 1998 

0.57 

0.78 

2.6 (2.2-3.1) 

0.55 (0.44-0.67) 

4.9(3.3-74) 


Headache 

No age restriction 

Carrat et al, 12 1999 

0.84 

0.26 

1.1 (1.0-1.2) 

0.62 (0.42-0.91) 

1.9 (1.2-3.0) 


Monto et al, 20 2000 

0.91 

0.11 

1.0 (0.99-1.0) 

0.81 (0.66-0.99) 

1.3 (1.0-1.6) 


Hulson et al, 17 2001 

0.88 

0.16 

1.1 (0.95-1.1) 

0.75 (0.43-1.3) 

1.4 (0.76-2.7) 


van Elden et al, 23 2001 

0.70 

0.43 

1.2 (0.87-1.7) 

0.70 (0.38-1.3) 

1.8(0.76-4.5) 


Summary 



1.0(1.0-14) 

0.75 (0.63-0.89) 

1.4 (1.2-1.8)" 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.68 

0.57 

1.6(14-2.3) 

0.56 (0.28-1.1) 

2.8 (1.0-7.8) 


Govaert et al, 14 1998 

0.44 

0.79 

24 (17-2.6) 

0.71 (0.60-0.83) 

3.0 (2.0-4.4) 


Summary 



1.9 (1.6-2.3) 

0.70 (0.60-0.82) 

3.0 (2.1 -4.3) b 



Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

a LR+ is the likelihood ratio when the finding is present; LR- is the likelihood ratio when the finding is absent; DOR is an indicator of the test’s overall accuracy. 
“Homogeneous DOR (P> .05). When the DOR was heterogeneous, we assessed for homogeneity separately for the positive and negative LRs. 
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Table 26-3 Test Characteristics of Clinical Findings, by Study 





Source, y 

Sensitivity 

Specificity 

LR+ (95% Cl) a 

LR- (95% Cl) a 

DOR (95% Cl) a 


Sore Throat 

No age restriction 

Monto et al, 20 2000 

0.84 

0.16 

1.0 (0.97-1.0) 

1.0(0.85-1.2) 

1.0 (0.8-1.2) 


Hulson et al, 17 2001 

0.75 

0.28 

1.0(0.91-1.2) 

0.89(0.62-1.3) 

1.2 (0.72-2.0) 


van Elden et al, 23 2001 

0.80 

0.33 

1.2(0.91-1.6) 

0.61 (0.28-1.3) 

1.9 (0.69-5.3) 


Summary 



1.0(0.98-1.0) 

0.96(0.83-1.1) 

1.1 (0.87-1,3) b 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.58 

0.36 

0.91 (0.61-1.4) 

1.2(0.66-2.1) 

0.8 (0.3-2.1) 


Govaert et al, 14 1998 

0.40 

0.81 

2.1 (1.7-2.7) 

0.74 (0.64-0.85) 

2.9 (2.0-4.3) 


Summary 



1.4 (0.81-2.5) 

0.77 (0.66-0.89) 

1.8 (0.81-4.0) 


Sneezing 

No age restriction 

Carrat et al, 12 1999 

0.50 

0.59 

1.2 (1.0-1.5) 

0.85(0.71-1.0) 

1.4 (1.0-2.1) 


van Elden et al, 23 2001 

0.33 

0.69 

1.1 (0.55-2.0) 

0.97(0.71-1.3) 

1.1 (0.42-2.8) 


Summary 



1.2 (1.0-1.5) 

0.87 (0.75-1.0) 

1.3 (0.95-1.9) b 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.32 

0.33 

0.47 (0.24-0.92) 

2.1 (1.4-3.1) 

0.2 (0.1-0.6) 


Nasal Congestion 

No age restriction 

Monto et al, 20 2000 

0.91 

0.19 

1.1 (1.1-1.2) 

0.47 (0.40-0.56) 

2.4 (2.0-2.9) 


van Elden et al, 23 2001 

0.68 

0.41 

1.1 (0.81-1.6) 

0.79 (0.44-1.4) 

1.4 (0.58-3.6) 


Summary 



1.1 (1.1-1.2) 

0.49 (0.42-0.59) 

2.3 (1,9-2.8) b 


Only patients > 60 y 

Nicholson et al, 25 1997 

0.47 

0.50 

0.95(0.57-1.6) 

1.0(0.67-1.7) 

0.9 (0.3-2.4) 


Chills 

No age restriction 

Carrat et al, 12 1999 

0.83 

0.25 

1.1 (1.0-1.2) 

0.68 (0.46-0.99) 

1.6 (1.0-3.0) 


Only patients > 60 y 

Govaert et al, 14 1998 

0.46 

0.82 

2.6 (2.0-3.2) 

0.66 (0.55-0.77) 

3.9 (2.7-5.7) 


Vaccine History 

No age restriction 

Hulson et al, 17 2001 

0.12 

0.83 

0.71 (0.41-1.2) 

1.1 (0.96-1.2) 

0.69 (0.37-1.3) 


van Elden et al, 23 2001 

0.02 

0.82 

0.11 (0.01-1.1) 

1.2(0.02-1.4) 

0.12(0.01-1.0) 


Summary 



0.63 (0.37-1.1) 

1.1 (1.0-1.2) 

0.60 (0.33-1.1 ) b 


Fever and Cough 

No age restriction 

Monto et al, 20 2000 

0.64 

0.67 

1.9(1.8-2.1) 

0.54 (0.50-0.57) 

3.6 (3.1-4.2) 


Only patients > 60 y 

Govaert et al, 14 1998 

0.30 

0.94 

5.0 (3.5-6.9) 

0.75 (0.66-0.84) 

6.6(4.2-10) 


Fever and Cough and Acute Onset 

No age restriction 

Monto et al, 20 2000 

0.63 

0.68 

2.0(1.8-2.1) 

0.54(0.51-0.58) 

3.6 (3.1-4.1) 


Only patients > 60 y 

Govaert et al, 14 1998 

0.27 

0.95 

5.4 (3.8-7.7) 

0.77 (0.68-0.85) 

7.1 (4.5-11) 



Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

a LR+ is the likelihood ratio when the finding is present; LR- is the likelihood ratio when the finding is absent; DOR is an indicator of the test’s overall accuracy. 
“Homogeneous DOR (P> .05). When the DOR was heterogeneous, we assessed for homogeneity separately for the positive and negative LRs. 
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findings for distinguishing patients with influenza from those 
without the illness among the unrestricted age group. The com¬ 
bination of fever and cough, with or without acute onset, had an 
intermediate diagnostic OR value. The diagnostic ORs were 
somewhat higher for all of these characteristics, particularly the 
combined symptoms, among persons aged 60 years or older; 
malaise (diagnostic OR, 4.9; 95% Cl, 3.3-7.1) also performed 
well in this group. 

Fever, headache, myalgias, and cough are the classic symp¬ 
toms associated with influenza. Unfortunately, these symp¬ 
toms are frequently observed in patients presenting with 
other infections during influenza season, making the clinical 
diagnosis of influenza problematic to the primary care physi¬ 
cian. These data suggest that the strongest predictor of influ¬ 
enza was the acute onset of both fever and cough in patients 
aged 60 years or older. 

We included only those studies in which a laboratory con¬ 
firmation of the influenza virus was performed for all 
patients, thus eliminating verification bias. However, not all 
studies used the same criterion standard diagnostic test; one 
study used culture data only, without supplementation by 
titer increase or polymerase chain reaction. 17 This may have 
caused false-negative results and a decreased estimate of 
prevalence of disease. In addition, several of the studies 
assessed the type of influenza (A vs B), whereas others did 
not. One study found that different clinical presentations 
were associated with the influenza type, 12 but another study 
showed no difference. 20 In the 2 studies that presented data 
on both influenza types A and B, the proportion of patients 
diagnosed with influenza B was small (< 10%). 12,20 Data pre¬ 
sented here reflect all diagnoses of influenza, regardless of 
type or subtype; we do not know whether the clinical presen¬ 
tation of disease varies according to type or subtype. 

The patient populations in the 6 studies were very differ¬ 
ent but represented a broad spectrum of patients with influ- 
enza-like illnesses. Two of the study populations were 
derived from randomized controlled trials of treatment or 
vaccine, 14,20 3 were prospective cohorts of patients presenting 
to general practitioners, 12,17,23 and 1 was a population-based 
cohort surveyed for symptoms weekly by telephone. 25 The 
studies were also from several countries: 2 from The Nether¬ 
lands, 14,23 1 from France, 12 1 from the United States, 17 and 1 
from England 25 ; in addition, one was a multinational study 20 
including patients from North America, Europe, and the 
southern hemisphere. This variability in patient population 
may have led to less precision in the assessment of symptoms 
because of cultural and language differences. The different 
study populations may also have had different clinical char¬ 
acteristics owing to the pool from which they were drawn. 
The studies from Europe were more likely to include patients 
from home. It is conceivable that these patients were more or 
less ill, had more or fewer symptoms, and had a different 
prevalence of influenza compared with the populations from 
the United States or other countries. 

Including the randomized controlled trials may lead to 
spectrum bias because patients who enroll in randomized 
controlled trials assessing either treatment or prevention 
of influenza may not represent the population of patients 


presenting to a primary care office. Spectrum bias may be 
a particular issue in the study by Govaert et al, 14 in that 
signs and symptoms were assessed in all the patients 
enrolling in the vaccine trial, even those without com¬ 
plaints of illness. 14 This not only leads to spectrum bias 
but also is consistent with the 6.6% prevalence in this 
study, which is lower than the prevalences in the other 
studies (range, 8%-67%). 

Other differences in the study populations include the 
age range within each study. This may be important 
because Cox and Subbarao 37 have observed that influenza 
presents differently among various age groups. Although 
most of the patients studied in these reports were adults, 
several of the studies did include children. Govaert et al 14 
and Nicholson et al 25 evaluated only individuals aged 60 
years or older. The positive LRs for several of the signs and 
symptoms evaluated in these studies are higher than those 
in the other studies. One possible explanation for this is 
that the clinical findings are more diagnostic of influenza in 
the elderly population or in a population with a lower prev¬ 
alence of disease. 

Although all of the studies recruited patients only during 
influenza season, some were specifically undertaken during 
epidemics. Thus, the prevalence of disease varies considerably 
in the published reports of the clinical findings. It is possible 
that clinical characteristics of the disease change between sea¬ 
sons according to the strain of influenza. All of the studies were 
performed before the SARS epidemic. 

Monto et al 20 suggested that the positive predictive value of 
clinical signs and symptoms increased with increasing dura¬ 
tion from illness onset. The 6 studies presented in this article 
had various durations of symptoms. Data were not available 
from each of the studies to assess whether the other studies 
supported the results of Monto et al. 20 

Approach to Influenza Diagnosis 

The 2003 outbreak of influenza brought the diagnostic 
dilemmas regarding influenza to the forefront. The reduced 
availability of vaccine for 2004-2005 created the potential for 
increased incidence of disease. When faced with a person 
with influenza-like illness, clinicians struggle with the deci¬ 
sion of whether to test or to empirically treat. 

There are several laboratory-based procedures available for 
diagnosing influenza. Viral culture is the criterion standard 
for laboratory diagnosis, but it may take several days to see 
cytopathic effects or for virus to be detected by hemadsorp¬ 
tion or hemagglutination. Rapid methods may shorten the 
time to identification but at some cost in sensitivity. Fluores¬ 
cent antibody staining or other immunoassays are used to 
confirm and to type influenza virus in culture and are fre¬ 
quently used directly on respiratory specimens as part of a 
respiratory virus battery. Results from direct immunoassays 
may be available within hours. Molecular methods such as 
reverse-transcriptase polymerase chain reaction and hybrid¬ 
ization-based arrays are likely to replace culture as the crite¬ 
rion standard because of their superior sensitivity and rapid 
turnaround time. However, the availability of technology is 
limited. For diagnostic dilemmas, research studies, and 
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epidemiologic purposes, influenza infection can also be 
detected by a 4-fold or greater increase in a variety of diag¬ 
nostic antibody titers (eg, hemagglutination inhibition, com¬ 
plement fixation, or enzyme immunoassay) between specimens 
collected at least 10 days apart. Although these laboratory- 
based methods are highly sensitive and specific, clinicians are 
increasingly reliant on point-of-care rapid diagnostic tests, 
which are easier to handle, are less costly, and provide test 
results in fewer than 30 minutes. 

A summary of the rapid diagnostic tests for influenza is 
provided by the CDC (http://www.cdc.gov/flu/professionals/ 
diagnosis/; accessed June 1, 2008). These include Directigen 
Flu A and Directigen Flu A + B (Becton-Dickinson, Franklin 
Lakes, New Jersey), FLU OIA and FLU OIA A/B (Thermo 
Electron Corp, Waltham, Massachusetts), XPECT Flu A/B 
(Remel, Lenexa, Kansas), NOW Flu A Test and NOW Flu B 
Test (Binax Inc, Portland, Maine), QuickVue Influenza Test 
and Quick Vue Influenza A + B Test (Quidel Corp, San 
Diego, California), SAS Influenza A Test and SAS Influenza B 
Test (SA Scientific Ltd, San Antonio, Texas), and ZstatFlu 
(ZymeTx Inc, Oklahoma City, Oklahoma). The tests require 
specimens of throat swabs, nasopharyngeal swabs, nasal 
washes, or nasal aspirates. The sensitivity and specificity of 
these tests have been reported in manufacturers’ reports to be 
between 40% and 100% and between 52% and 100%, respec¬ 
tively. Given the differences between older and younger per¬ 
sons in presenting symptoms of influenza, 37 the operating 
characteristics of these tests could differ among various age 
groups; however, we found no data confirming this. The 
QuickVue and ZstatFlu tests have waivers from the Clinical 
Laboratory Improvement Amendments and can be used in 


any office setting. The Quick Vue A + B Test is the only 
amendment-waived test that distinguishes between influenza 
A and B. 

Multiple studies have compared individual test kits vs the 
reference standard of viral culture (Table 26-4). 31 ' 36 In 2002, 
Rodriguez et al 35 published a study that directly compared 4 of 
the most widely used rapid diagnostic test kits in children with 
influenza-like illness. During the 1999-2000 epidemic, the 
authors had patients provide specimens for viral culture and 
direct fluorescent antigen, as well as for testing with Directigen 
Flu A, FLU OIA, QuickVue Influenza Test, and ZstatFlu A/B. 
Influenza A was found in 49% of the patients; 17% of the cases 
were detected by viral culture only. Sensitivity and specificity 
of the 4 tests ranged from 72% to 95% and from 76% to 84%, 
respectively. For diagnosing influenza, these tests all had simi¬ 
lar LRs (P = .69) when the results were positive, with a sum¬ 
mary LR of 4.7 (95% Cl, 3.6-6.2). The ZstatFlu test has a lower 
sensitivity than the other tests (P < .001); however, the remain¬ 
ing tests perform similarly {P > .99) and exceedingly well for 
ruling out influenza when the test result is negative, with a 
summary LR of 0.06 (95% Cl, 0.03-0.12). 35 

Two recent studies examined the cost-effectiveness of several 
influenza management strategies in adults, including several 
strategies in which rapid influenza diagnostic tests were 
used. 38 - 39 The estimates used for the sensitivity of the rapid tests 
ranged from 59% to 81%; the estimates used for specificity 
ranged from 70% to 99%. The prior probability estimate of 
influenza was 35% in the analysis by Rothberg et al 38 and 60% 
in that by Smith and Roberts. 39 In both analyses, testing strate¬ 
gies were less effective than empirical treatment because of the 
low sensitivity of the tests. These analyses were sensitive to the 


Table 26-4 Studies of the Performance of Rapid Diagnostic Tests for Influenza 


Source, y 

Study Period 

Location 

No. of 
Patients/ 
Specimens 

Age Range 

Design 

Selection Criteria 

Diagnostic Test 

Marcanteet al, 32 
1996 

December 

1994-February 

1995 

Padova, Italy 

41 

Children and 
adults 3 

Prospective cohort 

Pediatric/adult patients 
with symptoms of 
influenza-like illness 

Directigen Flu A 

Noyola et al, 33 
2000 

December 19, 
1997-April 13, 
1998 

Houston, TX 

196 

Children 3 

Prospective cohort 

Children with respiratory 
illness 

Zstat Flu A/B 

Quach et al, 34 
2002 

February and 
March 2001 

Montreal, Quebec 

300 

Children 3 

Prospective cohort 

Children with influenza¬ 
like symptoms present¬ 
ing to Children's Hospi¬ 
tal, Montreal 

QuickVue 

Rodriguez etal, 35 
2002 

December 19, 
1999-January 
13,2000 

Virginia 

152 

Ages 3 y to 
adult 3 

Prospective cohort 

Symptomatic patients 
seen in outpatient private 
practice 

Directigen Flu A, Zstat Flu 
A/B, QuickVue Influenza- 
Test A/B, FLU OIA A/B 

Bellei et al, 31 

2003 

May-October 

2000 

Sao Paolo, Brazil 

33 

18-56 y 

Prospective/retro¬ 
spective cohort 

Adult volunteers, 24-h 
onset of symptoms of 
influenza-like illness with 
influenza A confirmation 

QuickVue 

Cazacu et al, 36 
2004 

January-April 

2003 

Houston, TX; Ft 
Myers, FL; Syra¬ 
cuse, NY 

400 

Children and 
adults 3 

Prospective cohort 

Children and adults with 
respiratory or influenza¬ 
like symptoms 

Xpect Flu A/B 


“Specific ages not stated. 
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probability of influenza infection; the cost-effectiveness of 
empirical treatment improved relative to the testing strategy as 
the probability increased. In fact, in the study by Rothberg et 
al, 38 empirical treatment with a neuraminidase inhibitor in 
unvaccinated patients was more cost-effective at any probabil¬ 
ity of influenza greater than 14%. Testing was preferred only 
between a probability of 5% and 14% (in unvaccinated 
patients). The study by Smith and Roberts 39 yielded similar 
results, favoring rapid testing only at a lower prevalence of 
influenza. These studies highlight the importance of the physi¬ 
cian’s estimate of the likelihood of influenza. 

The decision analytic model used by Rothberg et al 38 was sen¬ 
sitive to vaccination status. In a recent systematic review of the 
literature, the estimated reduction in serologically confirmed 
cases of influenza A by the live attenuated aerosol vaccines was 
48%; the reduction with the use of inactivated parenteral vac¬ 
cines was 68%. 40 Vaccine efficacy and effectiveness may be 
affected by epidemiologic characteristics such as age and insti¬ 
tutionalization. At least 1 study showed a vaccine efficacy of 
58% in older patients who were not institutionalized. 41 

From these analyses, if one is able to estimate the probability 
of influenza to be greater than 25% to 30%, rapid diagnostic 
testing does not add to the overall cost-effectiveness of treat¬ 
ment. Thus, clinicians must develop a pretest probability 
based on clinical signs and symptoms, vaccination history, and 
epidemiologic risk factors. During influenza season, the CDC 
publishes weekly online updates that contain information 
about the prevalence of visits for influenza-like illness, along 
with data about influenza outbreaks (http://www.cdc.gov/flu/ 
weekly/fluactivity.htm; accessed March 28, 2008). The same 
information is generally available for each state through its 
own surveillance reporting systems. It is important that physi¬ 
cians understand the information available in the reports. The 
percentage of visits to sentinel providers for influenza-like ill¬ 
ness for the week ending December 6, 2003 (week 49), was 
high (5.1%), and there was regional variation (Figure 26-IB 
and C). Among laboratory respiratory specimens submitted as 
part of the CDC surveillance system, 37% tested positive for 
influenza during week 49 (Figure 26-1A). At the beginning of 
the 2003-2004 influenza season, for the week ending October 
4,2003 (week 40), the percentage of office visits for influenza- 
like illness was only 0.9% (Figure 26-IB); only 1.4% of labora¬ 
tory respiratory specimens tested positive for influenza during 
the same week. 

Unfortunately, there is no linkage between the surveillance 
systems for monitoring influenza-like illness and laboratory 
results. The CDC surveillance systems are careful to note that 
the system is designed to report where, when, and what influ¬ 
enza viruses are circulating, but the data cannot be used by 
the clinician to determine the probability that an individual 
patient with an influenza-like illness actually has influenza. 
Although the likelihood of influenza may vary, along with 
the frequency of influenza-like illness, no data exist for clini¬ 
cally determining whether the threshold levels of the decision 
analytic model have been exceeded. Depending on the acuity 
of illness, vaccination status, and presence of comorbid con¬ 
ditions, some physicians might choose to treat empirically 
with medication, whereas some might choose testing. 


CLINICAL SCENARIO—RESOLUTION 


The patient came to the office during the usual influenza 
season with classic influenza-like symptoms. She had 
been ill for 24 hours, was not vaccinated, and was 
exposed to many children with influenza-like illnesses. A 
suspicion of influenza forces the decision of whether to 
treat her symptomatically, treat her with an antiviral 
agent, or test for influenza with a rapid test. Because she 
was fewer than 48 hours into the illness, treating her 
could allow her to return to work more quickly if she 
does indeed have influenza. The data most pertinent to 
this patient were released by the CDC on December 11, 
2003 (http://www.cdc.gov/flu/weekfy/weeklyarchives2003- 
2004/weekly49.htm; accessed March 28, 2008). The CDC 
data indicated regional outbreaks of influenza in her 
area of the country, with 5.1% of primary care visits for 
influenza-like illnesses (Figure 26-1B). In week 49, 37% 
of specimens submitted had laboratory confirmation of 
influenza (Figure 26-1A). 

Once clinicians have used the symptoms in Tables 26-2 
and 26-3 to establish that a patient has an influenza-like 
illness, they should use epidemiologic data to determine 
whether influenza virus is circulating. Clinicians must 
rely on their clinical judgment in deriving a pretest 
probability for influenza. If a rapid influenza test is 
obtained, strict adherence to the manufacturer’s proto¬ 
col is required for accurate interpretation. If the result is 
positive, the odds of disease increase almost 5-fold 
(summary LR, 4.7). A negative rapid influenza test result 
(summary LR, 0.06) decreases the probability of disease 
and could effectively rule out influenza if the prior prob¬ 
ability is low. Astute clinicians will recognize that the 
decision to use rapid diagnostic testing can vary through¬ 
out the influenza season, depending on the age of their 
patient, the setting, and the prevalence of disease in their 
community. 


CLINICAL BOTTOM LINE 

Influenza presents with a constellation of symptoms, includ¬ 
ing cough, fever, malaise, myalgias, and headache. We 
reviewed the literature regarding signs and symptoms and 
their diagnostic accuracy for influenza. Unfortunately, no 
specific symptom or combination of symptoms is diagnostic 
of this common infection. Despite the variability in partici¬ 
pant nationality, language, culture, and age, as well as in clin¬ 
ical setting and influenza type/subtype in the studies 
reviewed, the data indicate that, although not perfect, the 
combination of fever and cough during influenza season sug¬ 
gests a significantly increased likelihood of influenza among 
elderly individuals. 

The usefulness of these signs and symptoms follows from 
their ability to identify a group with influenza-like illness. 
However, the prevalence of disease among this population var¬ 
ies from week to week and from year to year throughout the 
influenza season. Clinicians must pay attention to the surveil- 
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lance data to understand where, when, and what influenza 
viruses are circulating. As an example, the peak weeks during 
the 2002-2003 influenza season for influenza-like illnesses 
occurred later in the season and at lower rates than in 2003- 
2004. The role of rapid influenza tests has not been fully estab¬ 
lished, although it seems likely that clinicians will have many 
options for testing during future influenza seasons. In a ran¬ 
domized trial of the usefulness of rapid influenza tests in a 
pediatric emergency department, physicians provided with 
rapid test results ordered fewer laboratory tests and chest 
radiographs, prescribed fewer antibiotics but more antiviral 
agents, kept patients in the emergency department for shorter 
periods, and generated lower patient charges. 42 

Once the clinical criteria are used to establish the pres¬ 
ence of an influenza-like illness, there is little information 
other than epidemiologic data that is useful for guiding 
diagnostic and therapeutic decision making. During the 
current era of rapidly evolving infections with pathogens 
unfamiliar to most physicians, we do not know how well 
the symptoms, signs, and rapid diagnostic tests would per¬ 
form if these newer infections were to become epidemic. 
Clinicians in the United States must pay particular atten¬ 
tion to the weekly CDC and state reports regarding regional 
influenza patterns during influenza seasons. International 
clinicians should use data from the WHO International 
Influenza Program, the WHO Flunet, Health Canada, or 
the European Influenza Surveillance Scheme. The hyper¬ 
links for all these sites are available at the CDC Web site 
(http://www.cdc.gov/flu/weekly/intsurv.htm; accessed June 
1, 2008). It would be useful for clinicians if a formal linkage 
could be established between clinical and laboratory sur¬ 
veillance strategies, such as the collection of influenza virus 
cultures from a random sample of persons presenting with 
influenza-like illnesses, to allow more precise estimation of 
an individual’s likelihood of disease. In the absence of such 
a system, physicians may consider point-of-care testing 
among patients in their individual practices to gain an esti¬ 
mate of the prevalence of influenza among their patients 
presenting with influenza-like illnesses. 
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Prepared by David L. Simel, MD, MHS 
Reviewed by W. Paul McKinney, MD 


CLINICAL SCENARIO 


A 68-year-old woman calls for an urgent appointment 
before a midwinter trip to visit with her family and new 
grandchildren. She has a slight increase in her temperature 
(37°C [99.8°F]), but she has a cough and has felt gradually 
worse for 2 days. She wants to know whether she should 
cancel her trip. Her medical history reveals only hyperten¬ 
sion, for which she takes a single medication. 

The week before her call, you had not noticed any 
change in the steady number of 2 to 3 patients a day 
appearing in your office with similar symptoms. However, 
the day she called there were 5 similar calls from patients 
with influenza-like illness. 

UPDATED SUMMARY ON INFLUENZA 

Original Review 

Call SA, Vollenwieder MA, Hornung CA, Simel DL, McKin¬ 
ney WP. Does this patient have influenza? JAMA. 2005;293 
(8):987-997. 

The Update was prepared within 12 months of publication 
of The Rational Clinical Examination article, so the “Make 
the Diagnosis” section summarizes the findings published in 
the original article. 


CLINICAL SCENARIO—RESOLUTION 


This clinical scenario highlights the need to understand 
the local epidemiology of influenza and case definition for 
influenza-like illness. Once you notice a possible change in 
the number of patients with influenza-like illness, you 
should first check the Centers for Disease Control and 
Prevention (CDC) data to see whether the rate of influ- 
enza-like illness is increasing in your region and whether 


influenza has been isolated in reference laboratories. 
When you look at the CDC Web site, you notice that there 
is a steady increase in influenza-like illness being reported, 
along with evidence of influenza being increasingly iso¬ 
lated in reference laboratories. 

Her symptoms do not, however, identify her as having 
an influenza-like illness because she does not have a tem¬ 
perature higher than 38°C (100°F). The CDC definition of 
influenza-like illness requires the appropriate tempera¬ 
ture, accompanied by either a cough or sore throat. If you 
decided that she may really have had a fever but simply did 
not capture it with self-measurement, you should recog¬ 
nize that only her malaise (likelihood ratio [LR], 2.6) 
increases the likelihood that her illness is influenza. A 
cough alone, lack of acute onset, and the absence of fever 
in a patient older than 60 years do not have LRs suffi¬ 
ciently different from 1 and therefore do not provide you 
enough information. Because of the increasing rate of 
influenza and her concern, you ask her to come to the 
office for a rapid influenza test. 

The problem in deciding to order the rapid influenza 
test resides in the difficulty with estimating the prior 
probability. Cost-effectiveness studies show that testing is 
the appropriate strategy when the prior probability is 
between 5% and 14%; however, many clinicians will be 
uncertain. If you estimated that the probability of influ¬ 
enza was at the upper end of the testing strategy (say, 
15%), the negative result lowers the likelihood of influ¬ 
enza to approximately 1%. However, if you had thought 
that the probability of influenza was as high as 50% and 
ordered a test (rather than treated empirically), the nega¬ 
tive rapid influenza test result (LR, 0.06) decreases the 
probability to 5.7%. It is most likely that she has some 
other type of viral infection, which could also spread to 
her family, but the use of antivirals and advice about influ¬ 
enza should be based on the low probability estimate from 
the rapid influenza test. 
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INFLUENZA— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Clinicians must rely on informed clinical judgment to deter¬ 
mine the prior probability of influenza, which requires an 
understanding of the epidemiology weekly reports for influ¬ 
enza, available from the CDC (for the United States, http:// 
www.cdc.gov/flu/weekly/, and for International surveillance, 
http://www.cdc.gov/flu/weekly/intsurv.htm; accessed June 1, 
2008) that applies to your population. For US data, the CDC 
reports the frequency of influenza-like illness during influenza 
season and whether influenza virus is present in your region. 
However, they do not report whether a given patient with an 
influenza-like illness is likely to have influenza. 

POPULATION FOR WHOM INFLUENZA 
SHOULD BE CONSIDERED 

The clinical evaluation is used to identify influenza-like ill¬ 
ness among all patients during influenza season. 

DETECTING THE LIKELIHOOD OF INFLUENZA 

A few findings can lower the likelihood of influenza among 
patients (all adults or children) when they have influenza¬ 
like illness (Table 26-5). 


Table 26-5 Likelihood Ratios for Findings Across All Age Groups That 
Lower the Probability of Influenza 

Adults or Children 

LR (95% Cl) 

Absence of fever 

0.40 (0.25-0.66) 

Absence of cough 

0.42(0.31-0.57) 

Presence of nasal congestion 

0.49 (0.42-0.59) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


For older adults, the presence of a few findings increases the 
likelihood of influenza, whereas the presence of 1 finding 
(sneezing) makes influenza a little less likely (Table 26-6). 


Table 26-6 Likelihood Ratios for Findings That Suggest Influenza in 
Older Adults 

Adults > 60 y 

LR (95% Cl) 

Fever and cough, combined with acute onset 

5.4 (3.8-7.7) 

Fever 

3.8 (2.8-5.0) 

Malaise 

2.6 (2.2-3.1) 

Chills 

2.6 (2.0-3.2) 

Sneezing 

0.47 (0.24-0.92) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

During influenza season, a negative result on a commer¬ 
cially available rapid influenza tests greatly decreases the 
likelihood that a patient has influenza (Table 26-7). These 
studies may be most useful as the rate of influenza-like ill¬ 
ness is increasing on visits to sentinel providers and before 
data are available from the CDC that suggest influenza is cir¬ 
culating in your area. The CDC provides updated guidance 
on available rapid influenza tests and their role in screening 
for influenza (http://www.cdc.gov/flu/professionals/diagnosis/; 
accessed June 1, 2008). We cannot be absolutely certain that 
the operating characteristics of these tests will stay constant 
from one influenza season to the next. 


Table 26-7 Likelihood Ratios for Some Rapid Influenza Tests 

LR+ (95% Cl) LR- (95% Cl) 

Rapid influenza tests 3 4.7 (3.6-6.2) 0.06(0.03-0.12) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Directgen Flue A (Becton-Dickinson, Franklin Lanes, New Jersey), FLU OIA (Thermo 
BioStar, Boulder, Colorado), Quick Vue Influenza Test (Quidel Corp, San Diego, California). 

REFERENCE STANDARD TESTS 

Viral culture is the reference standard test, often accompa¬ 
nied by polymerase chain reaction tests. Some studies use a 
4-fold increase in viral antibody titers. Both of these refer¬ 
ence standards are suitable only for epidemiologic surveil¬ 
lance since neither result would be available to help guide 
the treatment of an individual patient because of the long 
turnaround time required for results. 
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CASE 1 A 20-year-old man presents to your office com¬ 
plaining of knee pain after playing basketball. During the 
game, as he came down after jumping for a rebound, 
another player fell on the back of his calf. He remembers 
hearing a pop and had pain on standing, preventing him 
from playing in the remainder of the game. While on the 
bench, he noticed that the pain improved, but his knee 
swelled. He iced the knee and was able to put some weight 
on it later that day. Today, putting all of his weight on the 
knee makes it feel as if it will buckle. 

CASE 2 A 72-year-old woman observes that her left knee 
swells. She has had pain in the medial aspect of the knee 
for several years, but only recently did she notice the full¬ 
ness. She fell several weeks ago. The knee hurts her con¬ 
stantly but feels worse going down stairs, especially early 
in the morning and late in the day. She finds acetamino¬ 
phen helps, but the pain relief is not adequate for her to be 
fully active. 


WHY IS THE DIAGNOSIS IMPORTANT? 


Ten percent to 15% of adults in the community report knee 
symptoms, with more than 3.3 million new visits made to 
physicians annually. 1 ' 2 Overall, knee pain accounts for 3% to 
5% of all visits to physicians, and a substantial proportion 
results in referrals for diagnostic imaging or specialty care. 3 
The history and physical examination can assist the examiner 
in determining whether the knee pain is part of a systemic 
condition or whether it represents a local musculoskeletal 
problem. When the knee pain is part of a local regional mus¬ 
culoskeletal disorder, the clinician must decide whether the 
pain represents a torn meniscal or ligamentous structure and 
then whether nonoperative or operative intervention is indi¬ 
cated. Because torn meniscal or ligamentous structures can 
cause significant pain and disability, injuries to these struc¬ 
tures may require expeditious repair. The physical examina¬ 
tion can aid the primary care clinician in assessing the 
likelihood of a torn meniscal or ligamentous structure and 
whether a referral will be beneficial. 

Although musculoskeletal conditions are common and 
costly, physicians in training receive little instruction in 
musculoskeletal medicine. 4,5 This educational deficit poten¬ 
tially leads to suboptimal treatment. Several studies have 
suggested that the musculoskeletal examination can be 
effectively taught through the use of small-group teaching 
and trained actors playing the role of the patient-educators. 6,7 
The purpose of this review is to analyze the diagnostic accu¬ 
racy of the physical examination for meniscal and ligamen¬ 
tous injuries. 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 
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Anatomy of Meniscal and Ligamentous Knee 
Injuries and Their Relationship to Symptoms 

The knee joint is the largest articulation in the body. It is a 
modified hinge with an extensive range of motion. The sta¬ 
bility of the joint is provided by the soft tissue structures: the 
anterior cruciate ligament (ACL) and the posterior cruciate 
ligament (PCL), the medial collateral ligament (MCL) and 
the lateral collateral ligament (LCL), the menisci, the capsule, 
and the muscles (Figure 27-1). The ACL and PCL add stabil¬ 
ity to the joint and aid in proprioception. The subcutaneous 
location in a weight-bearing extremity, combined with the 
relatively long lever arms exerting forces on the joint, renders 
the knee susceptible to injury. All of the structures that com¬ 
pose the knee joint synchronously function through a nor¬ 
mal, physiologic range of motion. Knee symptoms occur 
when any of these structures are altered, potentially creating 
interference with normal knee function. 

An anatomic description of the knee provides a basis for 
understanding the various injury patterns. The ligaments pas¬ 
sively limit the motion of the joint, thus providing stabilization. 
The ACL and PCL limit the anterior and posterior displace¬ 
ment of the tibia on the femur, respectively. Because the intact 
ACL prevents anterior motion of the tibia on the femur, an 
ACL injury leads to abnormal forward movement of the tibial 
plateau. This abnormal motion leads to relative internal rota¬ 
tion of the tibia during the terminal part of extension. Absence 
of a functioning ACL and the related anterolateral rotatory 
instability can result in the sensation that the knee is buckling 
or giving out. These symptoms occur with normal walking but 
may be most prominent during pivoting movements, such as 
those that occur with quick changes in direction. In the absence 
of knee buckling, patients with ACL disruption may express a 
loss of confidence in the stability of their knee, possibly because 
of the ACL’s role in proprioception. 8 

The PCL provides stability to most forces regardless of 
knee position. Isolated disruption of the PCL permits the 


tibia to displace posteriorly, decreasing the forces on the pos¬ 
terior horns of the menisci and increasing the forces directly 
on the articular surfaces of the medial compartment and 
patellofemoral joint. Although absence of the PCL may have 
no associated symptoms, it may result in hyperextension of 
the knee, posterior displacement of the tibia during knee 
flexion, and varus (bowlegged) and valgus (knock-kneed) 
angulation with the knee extended. Knee buckling may 
occur, especially during pivoting motions or when descend¬ 
ing stairs. Symptomatic PCL lesions are more common in 
patients with chronic tears or with acute tears associated with 
other ligament injuries. 

The meniscal fibrocartilages are semilunar, crescentic¬ 
shaped structures that are attached to the tibial plateau at the 
edge of the articulating surfaces of the femur and tibia. The 
menisci are wedge shaped, with a thin free edge at the inner 
margin and a wide base attaching to the tibia by the coronary 
ligaments. The surface is flat inferiorly and concave superi¬ 
orly, providing a congruous surface for the transmission of 
50% of the axial forces across the knee joint. 9 The menisci 
increase joint stability, facilitate nutrition, and provide lubri¬ 
cation and shock absorption for the articular cartilage. 10 The 
lateral meniscus is larger than the medial meniscus and less 
firmly attached to the tibia, resulting in a more mobile struc¬ 
ture. 10 The medial meniscus, firmly attached to the capsule 
and MCL, is relatively immobile. Because of its fixed nature, 
combined with the greater force transmission across the 
medial aspect of the knee, the medial meniscus is more sus¬ 
ceptible to injury. 10 Knee flexion forces the menisci posteri¬ 
orly. In extreme flexion, the posterior portion of the 
meniscus is firmly compressed between the posterior portion 
of the tibial plateau and femoral condyle. 

Mechanism of Meniscal and Ligamentous Knee Injuries 

The position of the joint at the time of the traumatic force 
dictates which anatomic structures are at risk for injury; 



Medial 


PCL 


Medial 

meniscus 


ACL 


Tibial 

plateau 


Figure 27-1 Anatomy of the Knee 

Right knee shown. Abbreviations: ACL, anterior cru¬ 
ciate ligament; LCL, lateral collateral ligament; MCL, 
medial collateral ligament; PCL, posterior cruciate 
ligament. 
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hence, an important aspect of obtaining the patient’s history 
for acute injuries is to allow him or her to describe the posi¬ 
tion of the knee and direction of forces at the time it was 
injured. In full knee extension, the ACL and PCL limit the 
anteroposterior motion of the tibia on the femur. The ACL is 
often injured during traumatic twisting injuries in which the 
tibia moves forward with respect to the femur, often accom¬ 
panied by valgus stress. No direct blow to the knee or leg is 
required, but the foot is usually planted, and the patient may 
remember a popping sensation at the injury. Similar to the 
ACL, PCL injuries often occur during twisting with a planted 
foot, in which the force of the injury is directed posteriorly 
against the tibia with the knee flexed. 

The most common collateral ligament injury results from 
an abduction and external rotation force applied on a knee in 
an extended or slightly flexed position. An intact MCL helps 
the ACL prevent posterior motion of the femur. An injury to 
the MCL may allow for anterior subluxation of the tibial pla¬ 
teau during flexion, especially in an ACL-deficient patient. 

Meniscal injuries typically occur through application of 
specific forces while the knee joint is in certain positions. 
During flexion, if the tibia is rotated internally, the posterior 
horn of the medial meniscus is pulled toward the center of 
the joint. This movement can produce a traction injury of 
the medial meniscus, tearing it from its peripheral attach¬ 
ment and producing a longitudinal tear of the substance of 
the meniscus. With aging, the meniscal tissue degenerates 
and can delaminate, thus making it more susceptible to split¬ 
ting from shear stress, resulting in horizontal cleavage tears. 
Without the menisci, the loads on the articular surfaces are 
increased significantly, leading to a greater potential for 
degenerative arthritis. Because the menisci are without pain 
fibers, it is the tearing and bleeding into the peripheral 
attachments, as well as traction on the capsule, that most 
likely produce a patient’s symptoms of pain. In fact, 16% of 
asymptomatic patients have meniscal tears demonstrated on 
magnetic resonance imaging (MRI), with the incidence 
increasing to 36% for patients older than 45 years. 11 Older 
patients are more likely to have degenerative meniscal tears 
with fewer mechanical symptoms and an insidious onset. 

With posterior horn tears, the meniscus can return to its 
anatomic position with extension. If the tear extends ante¬ 
riorly beyond the MCL, creating a bucket-handle tear, then 
the unstable meniscus fragment cannot always move back 
into an anatomic position. Such a meniscal tear can result 
in locking of the knee in a flexed position. Locking of the 
knee is more common in younger patients with meniscal 
tears. The lateral meniscus, being more mobile, is less likely 
to be associated with locking when torn. With walking, 
traction on medial or lateral meniscal tears may create a 
clicking sensation. 

Epidemiology of Meniscal and Ligamentous Knee Injuries 

Injuries to the collateral or cruciate ligaments or to menisci 
are difficult to account for entirely because many are diag¬ 
nosed without imaging or arthroscopic confirmation and 
many go undiagnosed. Data collected through surveys of 


athletes participating in organized sports or information 
collected at sports medicine clinics provide the most reli¬ 
able data but do not represent the true spectrum of menis¬ 
cal and ligamentous knee injuries. In a 7-year study of 
trainees at the US Naval Academy, women had a relative 
risk of 2.44 for ACL injury compared with men. 12 A similar 
increased risk for ACL injuries was also observed among 
female competitive alpine skiers. 13 A Norwegian study of 
soccer players with verified ACL injuries suggested that 
there were 0.063 injuries per 1000 game-hours; women had 
an almost 2-fold greater incidence of ACL injuries than did 
men. 14 Although a number of other studies exist, there are 
few epidemiologic data regarding other meniscal and liga¬ 
mentous knee injuries. 15,16 

Clinical Examination for Internal Derangement of the Knee 

The purpose of the examination is to make a correct ana¬ 
tomic diagnosis. The patient should be allowed to recite the 
history of the knee discomfort without interruption. After 
the history has been taken, the examiner inspects, palpates, 
and assesses function of the unaffected (or less affected) 
knee. Examining the healthy knee first creates trust that the 
examiner is not trying to cause pain and distracts the patient 
somewhat from the actual maneuvers, allowing greater relax¬ 
ation. The knees should be examined with the patient in a 
position that makes him or her most comfortable. 17 The 
healthy knee must be examined because an essential compo¬ 
nent of interpreting the findings in the affected knee is the 
comparison between knees. The following sections describe 
the cardinal features of a knee examination for a meniscal or 
ligamentous injury. 

Inspection 

After resolution of acute symptoms, a patient’s gait should be 
observed. Patients will usually assume a position that pro¬ 
vides them the most comfort. If the patient is seated on the 
examination table, the affected knee will be flexed and hang¬ 
ing off the edge. The quadriceps and calves should be evalu¬ 
ated for atrophy, often present after ligamentous injuries. 
The knees should be inspected for asymmetry that may indi¬ 
cate swelling. An early sign of effusion is the loss of the peri- 
patellar groove on either side of the patella, observed best 
with the patient supine. Also, swelling over the medial or lat¬ 
eral aspect of the joint should be recorded and may indicate 
local inflammation over the collateral ligaments. 

Palpation 

Differences in temperature between the knees suggest 
inflammation. With the patient supine, the knees should be 
examined for an effusion and discomfort with patellar 
motion. An effusion can be detected by noticing the loss of 
the peripatellar groove and by palpation of the fluid. Smaller 
effusions may be detected by compressing the medial and 
superior aspects of the knee and then pressing or tapping the 
lateral aspect to create a fluid wave. A perceptible bulge on 
the medial aspect suggests a small effusion; this sign may not 
be present with larger effusions. Ballottement of the patella 
may also be a useful technique for detecting an effusion. The 
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examiner quickly pushes down on the patella. In the normal 
knee joint with minimal free fluid, the patella moves directly 
into the femoral condyle, and there is no tapping sensation 
underneath the examiner’s fingertips. However, in the knee 
with excess fluid, the patella is floating; thus, ballottement 
causes the patella to tap against the femoral condyle. This 
sensation is transmitted to the examiner’s fingertips. Local¬ 
ized swelling over specific knee structures, such as the MCL 
or LCL, can also be observed. Crepitus, a palpable grating 
sensation, may be produced during certain motions in joints 
with cartilage disruption. 18 The maneuvers producing crepi¬ 
tus, the location of the crepitus, and any pain elicited should 
be recorded, joint line tenderness can also be detected by pal¬ 
pating medial and lateral to the patella in the groove between 
the femoral condyle and the tibia. 

Function 

The Lachman test, anterior drawer test, and lateral pivot shift 
test are the 3 physical examination maneuvers commonly 
used to assess the integrity of the ACL (Figure 27-2). 


Although the patient may be fearful, these functional tests 
should not cause pain with isolated ACL injuries in the sub¬ 
acute setting. 

Lachman test is typically performed while the patient 
lies supine with the knee flexed to 20 to 30 degrees. 19 The 
examiner stands to the side of the patient’s leg, with the 
patient’s heel on the examination table. The femur is 
grasped with one hand just above the knee. While the 
examiner grasps the femur firmly to prohibit motion of the 
upper leg and to relax the hamstrings, the other hand 
grasps the proximal tibia. The lower leg is then given a 
brisk forward tug, and a discrete end point should be felt. 
A positive test result is one in which the end point is not 
discrete or there is increased anterior translation of the 
tibia. The test is more difficult to perform when the exam¬ 
iner has small hands or the patient has large legs, both situ¬ 
ations making it more difficult to completely grasp the 
legs. In this situation, the patient may be placed prone, 
with the knee at the same degree of flexion while the exam¬ 
iner attempts the same motion of the tibia. 



Figure 27-2 Examination Maneuvers 

Right knee shown. Examination maneuvers include the Lachman, anterior drawer, lateral pivot shift, Apley compression, and McMurray tests. Lachman test, 
performed to detect anterior cruciate ligament (ACL) injuries, is conducted with the patient supine and the knee flexed to 20 to 30 degrees. The anterior drawer 
test detects ACL injuries and is performed with the patient supine and the knee in 90 degrees of flexion. The lateral pivot shift test is performed with the patient 
supine, the hip flexed 45 degrees, and the knee in full extension. Internal rotation is applied to the tibia while the knee is flexed to 40 degrees under a valgus 
stress (pushing the outside of the knee medially). The Apley compression test, used to assess meniscal integrity, is performed with the patient prone and the 
examiner’s knee over the patient’s posterior thigh. The tibia is externally rotated while a downward compressive force is applied over the tibia. The McMurray 
test, used to assess meniscal integrity, is performed with the patient supine and the examiner standing on the side of the affected knee. 
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The anterior drawer test is also performed with the patient 
supine and the knee in 90 degrees of flexion. The examiner 
quickly pulls the upper portion of the calf forward, using 
both hands. The tibia must not be rotated, and the ham¬ 
strings must be relaxed to properly assess the ACL. An intact 
ACL abruptly stops the tibia’s forward motion as the ACL 
reaches its maximum length. If the tibia can be moved anteri¬ 
orly without an abrupt stop, referred to as a discrete end 
point, this is considered a positive anterior drawer sign. It is 
often useful to perform this test on the uninjured knee to 
determine whether the amount of anterior translation differs 
between knees. 

The lateral pivot shift test combines a valgus stress (push¬ 
ing the outside of the knee medially) with a twisting force 
while the knee is being flexed (see Figure 27-2). In Losee’s 20 
version of the test, the patient rests on his or her back with 
the knee at 45 degrees’ flexion. The examiner places a hand 
on the lateral aspect of the knee and pushes medially, creat¬ 
ing a valgus strain. At the same time, the examiner’s other 
hand supports and pulls the foot laterally. As the examiner 
slowly extends the knee, the tibia and foot begin to twist 
internally. A positive test result consists of an obvious thud or 
jerk at 10 to 20 degrees’ flexion in the ACL-deficient knee, 
representing anterior subluxation of the tibia on the femur. 

Posterior or PCL stability is generally assessed with the 
posterior drawer test, which is performed with the patient 
supine and the knee flexed to 90 degrees. The alignment of 
the knees is inspected; if the tibia of the affected knee is sub- 
luxed posteriorly (a posterior sag), then applying anterior 
pressure will correct the sag. If the subluxation can be cor¬ 
rected, it is considered a positive posterior drawer sign. Oth¬ 
ers consider a posterior drawer test result to be positive if a 
posterior force on the tibia encounters no discrete end point, 
the reverse of the anterior drawer test. A method of assessing 
whether a PCL injury is present in combination with an 
MCL injury is to perform the abduction (or valgus) stress 
test with the knee in 2 positions. First, with the knee in 30 
degrees of flexion, the examiner supports the foot or ankle of 
the leg being examined and places the other hand along the 
lateral aspect of the knee. An inward or medial force is then 
applied to the knee while an opposite force is applied to the 
lower leg. The examiner grades the opening of the medial 
compartment of the knee. If the opening is larger on the 
injured side than on the opposite side, an MCL injury is sug¬ 
gested. The same test is then carried out with the knee held in 
full extension. Normally, the abduction stress test produces 
no opening of the medial compartment when the knee is 
fully extended in a patient with an intact PCL and MCL. If 
the opening of the medial compartment is similar with the 
knee in full extension, a combined PCL and MCL injury is 
suspected. 

Finally, meniscal integrity is assessed with several specific 
examination maneuvers, including the McMurray test, the 
Apley compression test, and the medial-lateral grind test 
(Figure 27-2). The McMurray test is performed with the 
patient supine. The examiner stands on the side of the 
affected knee and places one hand on the heel and another 
along the medial aspect of the knee, providing a valgus force. 


The knee is extended from a fully flexed position while the 
tibia is rotated internally. The test is repeated while the tibia 
is rotated externally. A positive sign is indicated by a popping 
and sensation of symptoms along the joint line, often accom¬ 
panied by an inability to fully extend the knee. 

The Apley compression test is performed with the patient 
lying in a prone position on a low examination table. The 
examiner applies his or her knee into the posterior thigh of 
the leg to be examined and then flexes and externally rotates 
the tibia while gripping the ankle. The examiner then com¬ 
presses the tibia downward. If this compression produces an 
increase in pain, the test result is considered positive. 

The medial-lateral grind test is performed with the patient 
supine on the examination table. The examiner cradles the 
affected leg’s calf in one hand and places the index finger and 
thumb of the opposite hand over the joint line. Valgus and 
varus stresses are applied to the tibia during flexion and 
extension. If a grinding sensation is palpated by the hand 
placed over the joint line, the medial-lateral grind test result 
is deemed positive. 

METHODS 

Search Strategy 

We conducted MEDLINE and HealthSTAR searches to 
retrieve articles pertaining to the physical examination of 
patients with suspected meniscal or ligamentous injury of 
the knee. The search of MEDLINE and HealthSTAR included 
all years from 1966 and 1975, respectively; both searches 
were extended through December 31, 2000. Keywords for 
searching included “knee,” “physical examination,” “internal 
derangement,” “anterior cruciate ligament,” “posterior cruci¬ 
ate ligament,” “medial collateral ligament,” “lateral collateral 
ligament,” and “meniscus.” Reference lists from relevant arti¬ 
cles were also manually searched. Searching was limited to 
English-language articles describing human studies. 

A total of 88 articles were retrieved. We included 26 articles 
that compared the performance of the physical examination 
of the knee to a reference standard, such as arthroscopy, 
arthrotomy, or MRI. Three articles were subsequently 
excluded because no primary data were reported, only aggre¬ 
gated sensitivities and specificities. Several categories of 
physical examination findings were included: widely avail¬ 
able maneuvers, maneuvers requiring specialized equipment, 
and general knee examination without specific maneuvers. 
We did not include data examining the accuracy of arthrom- 
etry or examination under anesthesia because both of these 
examination techniques are not widely available. Two of the 
authors, a rheumatologist (D.H.S.) and an orthopedic sur¬ 
geon (J.L.S.), graded each article for its methodologic quality, 
using a standardized scoring system. 21 The scoring system 
included assembly of the study (consecutive or otherwise), 
the relevance of the patient enrolled, the appropriateness and 
completeness of the reference standard, and the blinding of 
the examiner. 

Articles contained level 1 evidence if they used an indepen¬ 
dent, blind comparison of the examination with the refer- 
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ence standard among at least 50 consecutive and relevant 
patients. Level 2 articles were similar in their methods but 
contained fewer than 50 patients. If patients were not 
recruited consecutively, but the authors conducted an inde¬ 
pendent and blind comparison with the reference standard, 
then the article was considered level 3. Level 4 evidence came 
from articles that compared the examination with a reference 
standard, but patients were not collected consecutively, nor 
was the comparison independent. 

We noted whether the patients included in each study had 
acute or chronic knee symptoms and whether the examiner 
was a specialist in musculoskeletal care. However, no studies 
reported data separately for nonspecialist examiners. Data 
were abstracted from each article to allow for calculation of 
the sensitivity and specificity of each physical examination 
finding. Several articles commented on the composite exami¬ 
nation for meniscal or ligamentous injuries. These articles 
did not include data for specific examination maneuvers; 
rather, all aspects of the physical examination were combined 
in an unspecified manner. 

Analysis 

Sensitivity was calculated as the percentage of patients with a 
given lesion on the reference standard (usually arthroscopy 
or arthrotomy) who had an abnormal physical examination 
result; specificity was the percentage of patients without a 
given lesion who had normal results on the physical exami¬ 
nation maneuver. 22 When sensitivity and specificity were 
available, likelihood ratios (LRs) with 95% confidence inter¬ 
vals (CIs) were calculated according to the method of Simel 
et al 23 and Hasselblad and Hedges. 24 The LR for a positive test 
result was calculated as the sensitivity divided by (1 minus 
specificity); for a negative test result, the LR equaled (1 
minus sensitivity) divided by specificity. When several stud¬ 
ies provided data to calculate the LRs for the same examina¬ 
tion maneuver, a summary LR was estimated from a 
random-effects model to provide conservative values. 25 


RESULTS 

No articles could be identified that adequately examined the 
diagnostic accuracy of the physical examination for MCL or 
LCL lesions. Hence, these structures will not be discussed 
further. 

Anterior Cruciate Ligament Examination 

Three researchers reported on the composite examination 
for ACL injuries without giving data for specific examina¬ 
tion maneuvers (Table 27-1). 26-28 These investigators found 
that the sensitivity of the examination for ACL injuries was 
more than 82% and the specificity was more than 94%. The 
summary LRs for these studies were 25 (95% Cl, 2.1-306) 
for a positive examination result and 0.04 (95% Cl, 0.01- 
0.48) for a negative examination result. Twelve other 
studies 29 ' 40 were included that examined the anterior 
drawer, lateral pivot shift, and the Lachman maneuver tests. 


The methodologic quality of these studies was inconsistent; 
patients primarily had known ruptured ACLs and under¬ 
went subsequent arthroscopy or arthrotomy. Because only 
patients with known lesions were included, calculation of 
specificity and LRs in all but 4 studies was precluded. 

The specificity of the anterior drawer test for ACL ruptures 
ranged from 23% to 100%, with a mean of 67% (SD, 
42%). 29,36,38 Likewise, the sensitivity of the anterior drawer test 
varied from 9% to 93%, with a mean of 62% (SD, 
23%). 29,31,32,35-40 Several of these studies were small, which may 
explain the variability in results. The summary LR (Table 
27-2) for a positive anterior drawer test result was 3.8 (95% 
Cl, 0.7-22) and for a negative test result was 0.30 (95% Cl, 
0.05-1.50). Only 1 study 38 reported on the specificity of Lach¬ 
man test, and it found 100% specificity; however, the authors 
reported on a population of patients who underwent MRI 
and subsequent arthroscopy, thus limiting the generalizabil- 
ity of these findings. The sensitivity of Lachman test ranged 
from 60% to 100%, and the mean was 84% (SD, 15%) (Table 
27-1). 32,34 - 35,37 ' 40 According to the one study 38 that reported 
both the specificity and sensitivity of Lachman test, the LR 
for a positive test result was 42 (95% Cl, 2.7-651) and for a 
negative test result was 0.1 (95% Cl, 0-0.4) (Table 27-2). The 
specificity of the lateral pivot shift test has not been reported. 
The sensitivity of this maneuver varied from 27% to 95%, 
with a mean of 38% (SD, 28%). 30-32,35,39 

Posterior Cruciate Ligament Examination 

Two studies of the composite examination for PCL injuries 
reported a mean sensitivity of 91% and specificity of 98% 
(Table 27-3). 26 ' 27 The summary LR for a positive general 
examination result was 21 (95% Cl, 2.1-205) and for a nega¬ 
tive general examination result, 0.05 (95% Cl, 0.01-0.50). 
Three studies 36,41,42 analyzed the diagnostic accuracy of spe¬ 
cific examination maneuvers. The specificity of the posterior 
drawer test was not reported in any study. Two studies 41,42 
reported its sensitivity, which ranged from 51% to 86%, with 
a mean of 55%. The only other examination maneuver tested 
for PCL lesions was the abduction stress test, examined by 
the one investigator who originally described the test. 36 This 
test had a sensitivity of 94% and a specificity of 100%. The 
resulting LR for a positive test result was 94 (95% Cl, 6-1487) 
and for a negative test result, 0.1 (95% Cl, 0-0.4). 

Meniscal Examination 

Nine studies investigated the diagnostic accuracy of the 
examination for meniscal injuries (Table 27-4); all used 
arthroscopy as the reference standard. 27,28,33,43-48 Five of these 
studies reported the accuracy of the composite examination; 
the mean sensitivity for the composite examination was 77% 
(SD, 7%), and the specificity was 91% (SD, 3%). 27,28,43-45 Four 
other studies examined specific examination maneuvers. 33,46-48 
Joint line tenderness had a mean sensitivity of 79% (SD, 4%) 
and a specificity of 15% (SD, 22%). 33,46-48 The summary LR 
for a positive test result was 0.9 (95% Cl, 0.8-1.0) and for a 
negative test result, 1.1 (95% Cl, 1.0-1.3) (Table 27-5). The 
mean sensitivity of the McMurray test was 53% (SD, 15%), 
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Table 27-1 Diagnostic Accuracy of the Physical Examination for Anterior Cruciate Ligament Injuries 8 


Source, y 

Level of 
Evidence 

No. of 
Subjects 

Patient Population 

Reference 

Standard 

Examination 

Maneuver 

Sensitivity, 

% 

Specificity, 

% 

Simonsen et al, 26 1984 

1 

118 

Consecutive patients with hemarthrosis; acute 

Arthroscopy 

General examination 

62 

56 

O’Shea etal, 27 1996 

1 

156 

Consecutive patients with chronic knee pain; 
acute and chronic 

Arthroscopy 

General examination 

97 

100 

Rose and Gold, 28 1996 

4 

154 

Nonconsecutive patients with knee pain; chronic 

Arthroscopy 

General examination 

100 

99 

Braunstein, 29 1982 

4 

29 

Consecutive patients who underwent arthrogra¬ 
phy and then arthrotomy 

Arthrotomy 

ADT 

91 

89 

Dahlstedt and Dalen, 30 
1989 

2 

41 

Consecutive patients with hemarthrosis but no 
fracture on radiograph; acute 

Arthrotomy 

LPST 

37 

NA 

DeHaven, 31 1980 

4 

35 

Consecutive athletes with knee injuries and 

Arthroscopy 

ADT 

9 

NA 




hemarthrosis; acute 


LPST 

27 

NA 

Donaldson et al, 32 

4 

101 

Nonconsecutive patients from sports medicine 

Arthroscopy/ 

ADT 

70 

NA 

1985 



clinic found at surgery to have ACL tears; acute 

arthrotomy 

LPST 

35 

NA 






Lachman 

99 

NA 

Fowler and Lubliner, 33 
1989 

1 

24 

Consecutive patients with knee pain; chronic 

Arthroscopy 

JLT 

75 

NA 

Gurtler et al, 34 1987 

1 

75 

Consecutive patients with ACL tears; acute and 
chronic 

Arthroscopy 

Lachman 

100 

NA 

Hardaker et al, 35 1990 

1 

101 

Consecutive patients with knee injuries and 

Arthroscopy 

ADT 

18 

NA 




hemarthrosis presenting to sports medicine 
clinic; acute 


LPST 

29 

NA 





Lachman 

74 

NA 

Hughston etal, 36 1976 

4 

68 

Consecutive patients with ruptures of the 
“medial compartment”; acute 

Arthrotomy 

ADT 

65 

23 

Jonsson etal, 37 1982 

4 

107 

Nonconsecutive patients found at surgery to 

Arthroscopy 

ADT 

93 

NA 




have a ruptured ACL; acute and chronic 


Lachman 

60 

NA 

Lee etal, 38 1988 

4 

41 

Nonconsecutive patients who underwent MRI 

Arthroscopy 

ADT 

78 

100 




and then arthroscopy 


Lachman 

89 

100 

Liu etal, 39 1995 

4 

38 

Nonconsecutive patients with proven ACL rup- 

Arthroscopy 

ADT 

63 

NA 




tures; acute 


LPST 

95 

NA 






Lachman 

72 

NA 

Mitsou and 

4 

144 

Nonconsecutive patients with proven ACL rup- 

Arthroscopy/ 

ADT 

72 

NA 

Vallianatos, 40 

1988 



tures; acute and chronic 

arthrotomy 

Lachman 

91 

NA 


Abbreviations: ACL, anterior cruciate ligament; ADT, anterior drawer test; JLT, joint line tenderness; LPST, lateral pivot shift test; MRI, magnetic resonance imaging; NA, not appli¬ 
cable (no patients without lesions were included). 

“Acute patients refers to those treated within 3 months of their injury and chronic refers to beyond 3 months. If no mention is made of acute or chronic, the authors did not specify. 


and the specificity was 59% (SD, 36%). 33,46 ' 48 The summary 
LR for a positive test result was 1.3 (95% Cl, 0.9-1.7) and for 
a negative test result, 0.8 (95% Cl, 0.6-1.1). Other maneuvers 
were not formally examined in more than 1 study and 
included the Apley compression test, the medial-lateral grind 
test, and the presence of a joint effusion. The Apley compres¬ 
sion test had a sensitivity of 16%; no patients without menis- 
cal lesions were tested. 33 The medial-lateral grind test had a 
sensitivity of 69% and a specificity of 86%. 46 A joint effusion 
was found to have a sensitivity of 35% and specificity of 
100%; however, this last study only included patients admit¬ 
ted for arthroscopy. 47 

Limitations of Data 

Given the relative frequency and economic consequences of 
meniscal or ligamentous knee injuries, data about the accuracy 


Table 27-2 Selected Physical Examination Maneuvers for 
Ligamentous Knee Injuries 3 



LR (95% Cl) 

Source, y 

Positive 

Negative 

Anterior Drawer Test 

Hughston etal, 36 1976 

0.8 (0.6-1.2) 

1.5 (0.6-3.8) 

Braunstein, 29 1982 

8.2(2.2-31) 

0.1 (0-0.7) 

Lee et al, 38 1988 

37 (2.3-576) 

0.2 (0.1-0.5) 

Summary" 

3.8 (0.7-22) 

0.30(0.05-1.50) 

Lachman Test 

Lee etal, 38 1988 

42(2.7-651) 

0.1 (0-0.4) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

"Includes all studies with data supplied to calculate both sensitivity and specificity. 
"Calculated with a random-effects model. 
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Table 27-3 Diagnostic Accuracy of the Physical Examination for Posterior Cruciate Ligament Injuries 2 



Source, y 

Level of 
Evidence 

No. of Subjects 

Patient Population 

Reference 

Standard 

Examination 

Maneuver 

Sensitivity, 

% 

Specificity, 

% 

Simonsen et al, 26 
1984 

4 

118 

Consecutive patients with 
hemarthrosis; acute 

Arthroscopy 

General 

examination 

91 

80 

O'Shea et al, 27 

1996 

1 

156 

Consecutive patients with 
chronic knee pain; acute 
and chronic 

Arthroscopy 

General 

examination 

100 

99 

Hughston et al, 36 

1976 

4 

68 

Consecutive patients with 
ruptures of the medial com¬ 
partment; acute 

Arthrotomy 

AST 

94 

100 

Baker et al, 41 1984 

4 

40 

Nonconsecutive patients 
with known PCL tear; acute 

Arthroscopy 

PDT 

86 

NA 

1 — 
O 

o 

GO 

CD 

CD 

CO 

4 

59 

Nonconsecutive patients 
with PCL tear; acute 

Arthroscopy/ 

arthrotomy 

PDT 

51 

NA 


Abbreviations: AST, abduction stress test; NA, indicates not applicable (no patients without lesions were included); PCL, posterior cruciate ligament; PDT, posterior drawer test. 
“Acute patients refers to those treated within 3 months of their injury and chronic refers to beyond 3 months. If no mention is made of acute or chronic, the authors did not 
specify. 


Table 27-4 Diagnostic Accuracy of the Physical Examination for Meniscal Injuries 2 

Level of No. of Reference Examination Sensitivity, Specificity, 


Source, y 

Evidence 

Subjects 

Patient Population 

Standard 

Maneuver 

% 

% 

Daniel, 43 1991 

4 

177 

Nonconsecutive patients 
with suspected meniscal 
tears 

Arthroscopy/ 

arthrotomy 

General 

examination 

82 

78 

Gillies and 

Seligson, 44 

1979 

4 

50 

Nonconsecutive patients 
with known meniscal tears 

Arthrotomy 

General 

examination 

64 

NA 

Miller, 45 1996 

4 

57 

Nonconsecutive patients 
with known meniscal tears; 
acute and chronic 

Arthroscopy 

General 

examination 

81 

NA 

O'Shea et al, 27 

1996 

1 

156 

Consecutive patients with 
knee pain; acute and 
chronic 

Arthroscopy 

General 

examination 

73 

84 

Rose and Gold, 28 
1996 

4 

154 

Nonconsecutive patients 
with knee pain; chronic 

Arthroscopy 

General 

examination 

79 

79 

Fowler and 

1 

80 

Consecutive patients with 

Arthroscopy 

JLT 

85 

NA 

Lubliner, 33 1989 



knee pain; chronic 


McMurray 

29 

NA 






Apley 

16 

NA 

Anderson and 

4 

100 

Consecutive patients sus- 

Arthroscopy/ 

JLT 

77 

NA 

Lipscomb, 46 

1986 



pected of having meniscal 
tears presenting for arthros¬ 
copy; acute and chronic 

arthrotomy 

McMurray 

58 

29 




MLGT 

69 

86 

Barry et al, 47 1983 

4 

44 

Nonconsecutive patients 

Arthroscopy/ 

JLT 

76 

43 




presenting for meniscec¬ 
tomy 

arthrotomy 

McMurray 

56 

100 





Effusion 

35 

100 

Noble and Erat 48 

4 

200 

Nonconsecutive patients 

Arthroscopy 

JLT 

79 

11 

1980 



presenting for meniscec¬ 
tomy; acute and chronic 


McMurray 

63 

57 


Abbreviations: JLT, joint line tenderness; MLGT, medial-lateral grind test; NA, indicates not applicable (no patients without lesions were included). 

“Acute patients refers to those treated within 3 months of their injury and chronic refers to beyond 3 months. If no mention is made of acute or chronic, the authors did not 
specify. 
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of the physical examination were relatively limited. Although no 
specific examination maneuver has impressive test performance 
characteristics, the composite examination results for ACL, PCL, 
and meniscal lesions are reported to be reasonably sensitive and 
specific. One possible explanation for this finding is that a con¬ 
stellation of examination findings may be more useful than any 
one finding. No data were available to judge the accuracy of the 
physical examination results of the MCL and LCL. 

The patient population was an important determinant of the 
accuracy of the examination. Some investigators included only 
acute injuries and others, only chronic injuries, whereas some 
did not specify injury type. The chronicity of the injury may 
affect the sensitivity and specificity of examination maneuvers. 
The examination for ACL injuries was less accurate if a hemar- 
throsis was present because the increased intra-articular volume 
causes pain that is increased with any examination maneuver. 
This is a good illustration of spectrum bias, in which the spec¬ 
trum of patients included in a study affects the diagnostic accu¬ 
racy of a given test, 49 and may have accounted for some of the 
variation in the results reported between articles. 

Another potential source of variation was the experience of the 
examiner and the precise methods used for conducting the phys¬ 
ical examination test. It is commonly believed that the examina¬ 
tion for meniscal and ligamentous injuries is difficult to learn 
and that accuracy may therefore increase with experience. 
Although all of the studies included in this review used orthope¬ 
dic surgeons, the reports did not describe the examiners’ number 
of years in practice. If experience is an important determinant of 
accuracy, the data presented in this review should represent an 
upper limit for less experienced physicians. The definitions of an 
abnormal or positive physical examination result were not always 
clear from the articles. Also, the reproducibility of the physical 
examination was unclear and rarely reported. These sources of 
variation all contribute to heterogeneity between studies, illus¬ 
trated by broad 95% CIs in the summary LRs. 

The physical examination should be preceded by taking a 
careful history. Historical findings that may substantially 
improve the accuracy of the physical examination include the 
angle and force of impact if an injury occurred; whether the 
patient heard a pop at the injury; whether the patient has been 
experiencing catching, locking, or giving way of the knee; and 
whether the patient had noticed swelling around the knee. The 
sensitivity and specificity of historical items deserve attention, 
but we were unable to find published data regarding the sensi¬ 
tivity and specificity of commonly asked questions. Our review 
suggests that a combination of historical and physical examina¬ 
tion findings may be more useful than any one specific item. 
Future studies must pay careful attention to recruiting an appro¬ 
priate patient population, including subjects without pathologic 
lesions. They should also be careful in describing the physical 
examination, explicitly documenting criteria for abnormal; in 
calculating interobserver and intraobserver reliability; and in 
testing the diagnostic accuracy of clinically relevant clusters of 
historical and examination items. 

How to Improve Your Physical Examination Skills 

Improving your diagnostic skills for meniscal and ligamen¬ 
tous knee injuries takes practice. The physical examination 


Table 27-5 Selected Physical Examination Maneuvers for Meniscal 
Knee Injuries 3 



LR (95% Cl) 

Source, y 

Positive 

Negative 

McMurrayTest 

Noble and Erat, 48 1980 

1.5 (1.1-2.1) 

0.6 (0.5-0.9) 

Barry et al, 47 1983 

8.9(0.6-132) 

0.5 (0.3-0.7) 

Anderson and Lipscomb, 

48 1986 0.8 (0.5-1.3) 

1.5 (0.4-4.9) 

Summary" 

1.3 (0.9-1.7) 

0.8 (0.6-1.1) 

Joint Line Tenderness 

Noble and Erat, 48 1980 

0.9 (0.8-1.0) 

1.9 (0.8-4.3) 

Barry et al, 47 1983 

1.3 (0.7-2.6) 

0.6 (0.2-1.6) 

Summary" 

0.9 (0.8-1.0) 

1.1 (1.0-1.3) 

Joint Effusion 

Barry et al, 47 1983 

5.7 (0.4-86) 

0.7 (0.5-0.9) 

Medial-Lateral Grind 

Anderson and Lipscomb, 

46 1986 4.8 (0.8-30) 

0.4 (0.2-0.6) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Includes all studies with data supplied to calculate both sensitivity and specificity. 
"Calculated with a random-effects model. 


can be practiced on healthy patients to develop an examina¬ 
tion routine and gain a mental image of healthy anatomy. 
Patients with knee pain should be examined so that you can 
describe what you think is the anatomic lesion causing the 
pain. If you refer the patient, the referral letter should include 
your presumed anatomic diagnosis, which forces the exami¬ 
nation to be more thorough, and it will aid the consultant in 
his or her evaluation. If the patient undergoes MRI or sur¬ 
gery, compare your assessment with the imaging or surgical 
findings. 


CLINICAL SCENARIOS—RESOLUTIONS 


The first case describes a young man with a probable ACL 
rupture. The angle of injury, the presence of a pop, the 
difficulty bearing weight, and the transient swelling sup¬ 
port this diagnosis. He should be counseled about his 
prognosis, encouraged to begin a program of quadriceps 
strengthening, and given the option of pursuing surgical 
reconstruction if the symptoms are functionally limiting. 
The second case characterizes a common scenario in pri¬ 
mary care practices, the older patient with degenerative 
joint disease and a probable superimposed degenerative 
meniscal tear. This patient’s functional limitations need to 
be assessed carefully. If she is not too impaired, joint aspi¬ 
ration of the effusion, nonsteroidal anti-inflammatory 
drugs, quadriceps strengthening, and a cane may provide 
enough pain relief and mobility to make more invasive 
treatment unnecessary. If conservative management fails 
and her symptoms include locking or giving way, arthro¬ 
scopic partial meniscectomy may be useful. Patients with 
substantial impairment and significant degenerative changes 
on weight-bearing radiographs may be candidates for 
total knee replacement. 





























CHAPTER 27 The Rational Clinical Examination 


THE BOTTOM LINE 

According to our review of the literature and clinical expe¬ 
rience, we suggest the medical history and physical exami¬ 
nation for patients with possible meniscal or ligamentous 
lesions outlined in Box 27-] . Although there are scant spe¬ 
cific data supporting each element of the medical history 
and physical examination we have outlined, these items are 
presumed to be part of the composite examination that was 
found to be useful in determining whether there is a possi¬ 
ble meniscal or ligamentous injury. The composite exami¬ 
nation for an ACL tear performed by orthopedic physicians 
is highly predictive (positive LR, 25; 95% Cl, 2.1-306; nega¬ 
tive LR, 0.04; 95% Cl, 0.01-0.50), as is the composite exam¬ 
ination for a PCL tear (positive LR, 21; 95% Cl, 2.1-205; 


Box 27-1 Recommended Basic Medical History and Physical 

Examination for Patients With Suspected Meniscal or 

Ligamentous Knee Injuries 

HISTORICAL ITEMS 

1. Where exactly is the knee pain (point to it with 1 finger)? 

2. What is the duration of the pain? 

3. Before the pain started, had there been a change in 
activities? 

4. Was there an injury to the lower extremity; if so, what 
was the direction of the forces? 

5. Was there a pop at the injury? 

6. Was the knee swollen at the injury or anytime since? 

7. Is there giving way or buckling of the knee? 

8. Is the knee locking or catching in extension or flexion? 

9. Is there pain in the hip, thigh, or back? 

PHYSICAL EXAMINATION TESTS 

1. Alignment: Are the femur, tibia, and patella in normal 
alignment during standing and walking? 

2. Range of motion: Can the patient actively or passively 
flex and extend the knee? 

3. Effusion: Is there a fluid wave or does ballottement of 
the patella produce a tapping sensation? 

4. Joint line tenderness: Is the patient tender along the 
medial or lateral joint lines? 

5. Lachman test: Is there a discrete end point when the 
tibia is anteriorly subluxed on the femur? 

6. Anterior drawer test: Is there anterior subluxation of 
the tibia on the femur? 

7. Posterior drawer test: Is there posterior sag or transla¬ 
tion of the tibia on the femur? 

8. Lateral pivot shift: Does the tibia jump anteriorly when 
extended or flexed with a valgus stress? 

9. McMurray test: Is there a popping at the joint line 
when the knee is extended and rotated? 


negative LR, 0.05; 95% Cl, 0.01-0.50). The examination for 
meniscal tears is less efficient; the composite examination 
confers a positive LR of 2.7 (95% Cl, 1.4-5.1) and a negative 
LR of 0.4 (95% Cl, 0.2-0.7). If the medical history and 
physical examination do not allow the determination of a 
meniscal or ligamentous injury, consultation with a muscu¬ 
loskeletal specialist may obviate expensive and unnecessary 
diagnostic imaging. 
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Prepared by Daniel H. Solomon, MD, MPH, Jeff Katz, MD, 
David Bates, MD, and Jonathan L. Schaffer, MD, MBA 

Reviewed by Richard Riedel, MD 


CLINICAL SCENARIO 


A 55-year-old man presents with 6 months of knee pain. 
He observes that the pain recently intensified after a week¬ 
end of skiing. He denies any specific trauma during his ski 
trip. He feels increased pain when squatting or walking 
down stairs. An occasional click has been audible when he 
is walking. 

UPDATED SUMMARY ON THE RATIONAL 
KNEE EXAMINATION 

Original Review 

Solomon DH, Bates DW, Katz JN, Simel DL, Schaffer JL. Does 
this patient have a torn meniscus of the knee? the value of the 
physical examination in determining whether a patient has a 
meniscal or ligamentous injury. JAMA. 2001;286(13):1610- 
1620. 


UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the sub¬ 
ject headings “exp knee,” “exp ligament,” and “exp meniscus,” 
published in English from 2002 to July 2004. The search 
yielded 12 articles. We reviewed all the titles and abstracts, 
identifying 4 articles for additional review. None of these 
original articles are included in the update. Two did not meet 
the quality review criteria (examiner blinded to the criterion 
standard or nonselected patient population) that we origi¬ 
nally established. The other 2 did not provide adequate data 
for combining with the previous studies. 

We did find 1 new nonsystematic review that addresses the 
sensitivity and specificity of some of the key examination 
maneuvers for meniscal and ligamentous injuries of the knee. 1 
The review did not include a methods section for identifying 
the literature or a methodologic assessment of the referenced 
articles. Thus, some conclusions from the recently published 
review may seem clinically sensible but be incorrect because 
the articles included were not all methodologically rigorous. 


CHANGES IN THE REFERENCE STANDARD 

The reference standard is the examination of the knee 
structure of interest (ligament or meniscus) at surgery. 
However, for patients who do not undergo surgery, the 
magnetic resonance imaging (MRI) results are the reference 
standard. 

NEW FINDINGS 

A nonsystematic review found no additional evidence to alter 
the following conclusions. 

• The Lachman test is the best maneuver for detecting ante¬ 
rior cruciate ligament tears. 

• The McMurray test has inadequate sensitivity for ruling 
out a meniscal tear. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

There are no changes in the original data presented on the 
rational examination for the meniscus and ligaments of the 
knee. A JAMAInteractive displays the anatomy and maneu¬ 
vers of the knee examination (http://jama.ama-assn.org/cgi/ 
content/full/286/13/1610/DCl; accessed June 1, 2008). 

Results of Literature Review 

No data suggest new validated examination items for inju¬ 
ries to the meniscus or ligaments of the knee. Symptoms 
common in patients with meniscal injuries include click¬ 
ing, locking, and pain. With anterior cruciate ligament 
injuries, patients have pain and giving way of the knee. A 
nonsystematic review concluded that the Lachman test is 
the best test for anterior cruciate ligamentous injuries. 
The anterior drawer test has been studied more frequently 
(see Figure 27-2). 

Evidence From Guidelines 

No government guidelines explicitly address the diagnosis of 
injuries of the meniscus or ligaments. 
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CLINICAL SCENARIO—RESOLUTION 


The patient should be asked about key symptoms of menis- 
cal and ligamentous injuries, including clicking, locking, 
and giving way. The examination should include the ante¬ 
rior drawer and Lachman tests for anterior cruciate liga¬ 
ment injuries and at least the medial lateral grind test for 
meniscal injuries. 

Further medical history and evaluation reveal that the 
patient has pain with squatting and a positive medial lat¬ 
eral grind test result (positive likelihood ratio [LR], 4.8). 


Both of these findings, together with negative anterior 
drawer (negative LR, 0.3) and Lachman test results (nega¬ 
tive LR, 0.1), suggest that his injury is likely meniscal and 
not of the anterior cruciate ligament. 

If the patient is not functionally disabled, a trial of 
anti-inflammatory medicines and physical therapy 
should be attempted for 8 to 12 weeks. At that point, fur¬ 
ther evaluation will determine the need for more testing 
or treatment. 


KNEE EXAMINATION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY FOR A LIGAMENTOUS 
OR MENISCAL TEAR 

The physical examination can help in determining which 
patients are likely to have meniscal or ligamentous injuries of 
the knee. However, no data exist that allow us to establish reli¬ 
able prior probability estimates. Among patients with knee 
pain referred by primary care providers or rheumatologists to 
an orthopedist, the orthopedist will clinically diagnose menis¬ 
cal tears in about 25% of patients and ligamentous injuries in 
about 10%. 2 We do not know the underlying distribution of 
these conditions in patients who do not require referral. 
Because the mechanism of an injury predicts the actual ana¬ 
tomic defect, experts probably can predict (better than 
chance) the most likely injury when they either observe the 
trauma or get a reliable medical history. 

POPULATION FOR WHOM LIGAMENTOUS OR 
MENISCAL INJURIES OF THE KNEE SHOULD 
BE CONSIDERED 

Adults with knee pain associated with an injury or with 
mechanical symptoms, including clicking, catching, locking, 
or giving way. 

DETECTING THE LIKELIHOOD OF A LIGAMENTOUS OR 
MENISCAL INJURY OF THE KNEE 

The best physical examination maneuvers for ligamentous 
tears or meniscal injuries are shown in Table 27-6. A 
JAMAInteractive displays the anatomy and some of the 
maneuvers described in Table 27-6 (http://jama.ama-asn.org/ 
cgi/content/full/286/13/1610/DCl; accessed June 1,2008). 


Table 27-6 Physical Examination Maneuvers for Ligamentous and 
Meniscal Injuries of the Knee 

Symptom (No. of Studies) LR+ (95% Cl) LR- (95% Cl) 

Anterior Cruciate Ligament Injuries 

Lachman test (1) 

42(2.7-651) 

0.1 (0.0-0.4) 

Anterior drawer test (3) 

3.8 (0.65-22) 

0.3 (0.05-1.5) 

Meniscal Injuries 

Joint effusion (1) 

5.7 (0.4-86) 

0.7 (0.5-0.9) 

Medial lateral grind test (1) 

4.8 (0.8-30) 

0.4 (0.2-0.6) 

McMurray test (3) 

1.3 (0.9-1.7) 

0.8 (0.6-1.1) 

Joint line tenderness (2) 

0.9 (0.8-1.0) 

1.1 (1.0-1.3) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio. 

REFERENCE STANDARDItSTS 

Serial clinical examinations performed by a specialist are a 
pragmatic reference standard. 

MRI is used to rule in or rule out ligamentous tears. 
Arthroscopy may be required to rule out meniscal tears. 


REFERENCES FOR THE UPDATE 2. Solomon DH, Avorn J, Warsi A, et al. Which patients with knee prob¬ 

lems are likely to benefit from nonarthroplasty surgery? Arch Intern Med. 
1. Malanga GA, Andrus S, Nadler SF, McLean J. Physical examination of the 2004;164(5):509-513. 

knee: a review of the original test description and scientific validity of - 

common orthopedic tests. Arch Phys Med Rehabil. 2003;84(4):592-603. a 

a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 


















EVIDENCE 


TO SUPPORT THE UPDATE: 
Knee Ligaments and Menisci 



TITLE Physical Examination of the Knee: Review of the 
Original Test Description and Scientific Validity of Com¬ 
mon Orthopedic Tests. 

AUTHORS Malanga GA, Andrus S, Nadler SF, McLean J. 

CITATION Arch Phys Med Rehabil. 2003;84(4):592-603. 

QUESTION How were the common physical examina¬ 
tion maneuvers for the knee described originally, and 
what is their sensitivity and specificity? 

DESIGN Qualitative systematic review (ie, systematic 
review without meta-analysis) of articles that examined 
physical examination items for injuries of the meniscus 
and ligaments of the knee. There was no attempt to pool 
data across studies, and there were no explicit criteria for 
which studies were included. 

DATA SOURCES MEDLINE and bibliographies of all 
publications included for review and recent review articles. 

STUDY SELECTION AND ASSESSMENT There 
were no clear selection criteria and no formal assessment 
of the articles included in the review. MEDLINE was 
searched for articles published between 1970 and 2000. 
Search terms included each of the involved knee structures 
and examination maneuvers. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The authors included studies of the anterior cruciate liga¬ 
ment, posterior cruciate ligament, medial and collateral liga¬ 
ments, patellofemoral disorders, and the meniscus. Some 


studies allowed examiners to conduct the studies under anes¬ 
thesia. The diagnostic standards were many, including mag¬ 
netic resonance imaging or arthroscopic findings. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity. 

MAIN RESULTS 

For each physical examination maneuver, the authors provide 
the original description of the examination technique. 

CONCLUSION 

LEVEL OF EVIDENCE Narrative review. 

STRENGTHS This review addresses specific examination 
maneuvers and includes the original descriptions of common 
examination maneuvers in detail sometimes not provided in 
original studies. 

LIMITATIONS There was no clear method for selecting the 
included articles and no attempt was made to pool the 
results. 

The review does not add new information to the current 
understanding of the physical examination for meniscal or 
ligamentous injuries of the knee. The sensitivity and specific¬ 
ity values reported are not associated with quality scores, so 
the clinician cannot confidently apply the results. However, 
this narrative is useful in providing descriptions for how each 
examination technique is performed. 

Reviewed by Daniel H. Solomon, MD, MPH 
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CHAPTER 


CLINICAL SCENARIOS 


Is This Adult Patient 

Malnourished? 

Allan S. Detsky, MD, PhD 
Philip S. Smalley, MD 
Jose Chang, MD 


CASE 1 Ten days before being treated, a 65-year-old man 
experienced a Wallenberg stroke involving the lateral 
medulla, which left him with difficulty swallowing. Since 
then, he had been treated with intravenous fluids, as 
attempts at eating led to mild aspiration with pneumonia. 
In that period, he lost 6% of his usual body weight and 
was continuing to lose weight. He felt weak and was able 
to ambulate only with difficulty because of his stroke- 
related ataxia and generalized weakness. On physical 
examination, there was an obvious squared-off appear¬ 
ance to his shoulders from subcutaneous tissue and mus¬ 
cle wasting. There was no edema. 

CASE 2 A 63-year-old man was admitted to the hospital for 
gastric resection of an obstructing gastric carcinoma. He was 
well until 6 weeks before admission, when he began to notice 
the rapid onset of early satiety. This progressed to the point 
where he began to vomit virtually all food and fluids. He had 
lost 15% of his body weight and was continuing to lose 
weight. He was ambulatory but felt weak and was no longer 
able to carry on his usual daily activities because of this weak¬ 
ness. On physical examination, there was muscle wasting. 
There was obvious subcutaneous tissue loss in the triceps and 
thoracic regions, as well as muscle loss in the deltoids. There 
was edema in his ankles but no ascites. 

CASE 3 A 70-year-old man was admitted to the hospital 
for resection of his descending colon because of an adeno¬ 
carcinoma detected on investigation for bright-red blood 
in his bowel movements. Between 6 and 3 months before 
admission, he had lost 10% of his body weight for reasons 
that he could not explain. However, his weight had stabi¬ 
lized in the 2 months before admission, and in fact, he 
had gained back 4% of his weight. His dietary intake had 
been slightly below normal but had recently improved. He 
reported no significant gastrointestinal (GI) symptoms 
other than the bleeding and a mild change in his bowel 
habits. He had his usual level of energy. On physical 
examination, there was no evidence of subcutaneous tis¬ 
sue loss, muscle wasting, edema, or ascites. 


WHY PERFORM NUTRITIONAL 
STATUS ASSESSMENT? 


Malnutrition occurs among patients either because of their pri¬ 
mary diseases (eg, malignancy) or because the procedures they 
undergo to treat the primary disease prevent them from receiving 
adequate nutritional intake for prolonged periods (eg, surgery). 

There are 2 components of nutritional status assessment. The 
first is body composition analysis, which is the determination of 
the mass of body components, such as total body protein and 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 








CHAPTER 28 The Rational Clinical Examination 


total body fat. These components are measured by in vivo neu¬ 
tron activation analysis and tritiated water dilution technique, 
which represents the criterion standard (also known as the gold 
standard) for measures of body composition. The second com¬ 
ponent is physiologic function, defined by some as changes in 
cellular and organ function, measured in a variety of ways, such 
as skeletal muscle strength, respiratory function, protein synthe¬ 
sis, and tissue repair. 

During the past 3 decades, clinicians have become increasingly 
aware of the prevalence of malnutrition among hospitalized 
patients. 14 Clinicians have recognized that malnourished patients 
are at a higher risk of developing complications while undergo¬ 
ing treatment. These complications include death, sepsis, abscess 
formation, other infections such as pneumonia, wound healing 
difficulties postoperatively, and respiratory failure. Some have 
used the term nutrition-associated complications 5,6 to highlight 
the relationship between malnutrition and these adverse events. 
The increased risk for malnourished patients is thought to be 
caused more by functional impairment than changes in body 
composition, 7 although in studied subjects there is clearly a cor¬ 
relation between the 2 components of nutritional status. 

Investigations in the 1970s 1,2 estimated that the prevalence of 
malnutrition among hospitalized patients was as high as 40%. 
Studies 4,8 on patients undergoing general GI surgery showed 
that the prevalence of either mild or severe malnutrition was 
48% 3 and 31%, respectively. Detsky et al 4 confirmed the rela¬ 
tionship between malnutrition and the risk of nutrition-associ¬ 
ated complications. In their series of 202 patients undergoing 
general GI surgery at 2 Toronto (Ontario, Canada) teaching 
hospitals, 10% of the total series of patients had major nutri¬ 
tion-associated complications, including 6 deaths related to 
sepsis, 2 nonfatal episodes of sepsis, 3 subphrenic or intra¬ 
abdominal abscesses, 2 anastomotic breakdowns, 2 wound 
dehiscences, and 5 major wound abscesses. However, among 
those who were assessed to be severely malnourished preopera- 
tively, this major complication rate was 67%. Windsor and Hill, 7 
using a slightly different system of nutritional status assessment 
in 102 patients undergoing major GI surgery, also showed that 
severely malnourished patients had a higher risk of major com¬ 
plications than patients designated as having normal nutritional 
status. These results confirm the usefulness of nutritional status 
assessment as a predictor of high risk for postoperative compli¬ 
cations. Thus, it becomes both a method of assessing prognosis 
and a method of diagnosing a particular health state. Further¬ 
more, assessing nutritional status identifies patients who may 
benefit from enteral or parenteral nutritional repletion to reduce 
the risk of these complications. 911 Although patients with 
chronic medical conditions also are thought to be at higher risk 
of developing complications, such as respiratory failure or infec¬ 
tion, most of what we know comes from patients undergoing 
surgical procedures. 


THE ANATOMIC/PHYSIOLOGIC ORIGIN 
OF FINDINGS IN THIS AREA 

Syndromes of undernutrition of calories and protein have 
been studied most extensively in children of developing 


nations and are not frequently observed in North America. 
Two extremes of protein-energy malnutrition have been 
defined: marasmus, caused primarily by deficiency of calories, 
resulting in stunted growth in children, loss of adipose tissue, 
and generalized wasting of lean body mass without edema; 
and kwashiorkor, a primary deficiency of protein manifested 
by edema but in which adipose tissue is preserved. 

Many individuals who are malnourished will have elements 
of both protein and calorie deficiencies. The complex meta¬ 
bolic processes that result from protein-energy malnutrition 
are beyond the scope of this overview. However, in North 
America, nutritional assessment is used as a predictor of future 
complications in patients and therefore may go beyond the 
traditional measurement of pure malnutrition resulting from 
inadequate intake of protein, calories, or micronutrients. 
Nutritional assessment, particularly if it encompasses or 
focuses on physiologic function, may be an overall marker of 
illness that is not caused solely by inadequate intake or 
reversed by nutritional supplementation. This may explain 
why the clinical trials of total parenteral nutrition in a variety 
of clinical circumstances have in some cases produced disap¬ 
pointing results in improving outcomes. 12 

Nutritional deficiency syndromes involving vitamins and 
micronutrients evolve through 3 stages because most 
micronutrients are stored in tissues, and a temporary 
reduction in intake is buffered by a reduction in body 
stores. The second stage involves metabolic changes with¬ 
out symptoms, whereas severe depletion will result in the 
final stage of clinical signs and symptoms. They will not be 
discussed in this article. 


HOW TO PERFORM NUTRITIONAL ASSESSMENT 

This article primarily describes features of the medical history 
and physical examination for assessing overall nutritional status. 

The relevant features of a patient’s medical history and 
physical examination can be elicited by a technique known 
as the subjective global assessment (SGA) of nutritional sta¬ 
tus. 8 The application of this technique divides patients into 
3 classes: class A, well nourished; class B, moderately (or 
suspected of being) malnourished; and class C, severely 
malnourished. The components of this technique are 
described in Table 28-1. There are 4 elements of the medical 
history. 

1. Weight Loss in the 6 Months Before the 
Examination, Expressed as a Proportionate 
Loss From Previous Weight 

A weight loss of less than 5% is considered small. A weight loss 
between 5% and 10% is considered potentially significant, and 
a weight loss of more than 10% is considered definitely signifi¬ 
cant. In addition to considering the amount of weight loss, it is 
important to note the pattern of the weight loss. For example, 
suppose a patient lost 12% of his or her weight in the 6 months 
to 1 month before the examination and then regained half of 
that weight in the subsequent month, resulting in a net loss of 
6% for the entire period. This patient would be considered 
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Table 28-1 Features of Subjective Global Assessment 3 
Medical History 

1. Weight change 

Overall loss in past 6 months: amount =_kg;_% 

Change in past 2 weeks: _increase 

_no change 

_ decrease 

2. Dietary intake change (relative to normal) 

_no change 

_change _duration = _weeks 

_type: _suboptimal solid diet _full liquid diet 

_hypocaloric liquids _starvation 

3. Gastrointestinal symptoms (that persisted for > 2 weeks) 

_none _nausea _vomiting _diarrhea _anorexia 

4. Functional capacity 

_no dysfunction (eg, full capacity) 

_dysfunction _duration = _weeks 

_type: _working suboptimally 

_ambulatory 

_bedridden 

Physical (for each trait specify: 0 = normal, 1+ = mild, 2+ = moderate, 3+ = severe) 

_loss of subcutaneous fat (triceps, chest) 

_muscle wasting (quadriceps, deltoids) 

_ankle edema 

_sacral edema 

_ascites 

Subjective global assessment rating (select one) 3 
_A = well nourished 

_B = moderately (or suspected of being) malnourished 

_C = severely malnourished 

“Class A indicates individuals with less than 5% weight loss or more than 5% total weight loss but recent gain and improvement in appetite; class B, those with 5%-10% weight 
loss without recent stabilization or gain, poor dietary intake, and mild (1 +) loss of subcutaneous tissue; and class C, ongoing weight loss of more than 10%, with severe subcu¬ 
taneous tissue loss and muscle wasting, often with edema. 

Derived from Detsky et al. 8 


better nourished than a patient who had lost 6% progressively 
in the 6 months, with continued weight loss in the recent 
weeks, before the examination. Patients can be considered well 
nourished despite significant proportions of weight loss if 
there has been a recent stabilization or increase in weight. In 
eliciting the history of weight pattern from patients, we recom¬ 
mend asking the patient what his or her maximum weight was 
and what it was 1 year ago, 6 months ago, 1 month ago, and at 
present. If patients report substantial weight loss that we can¬ 
not confirm with prior records, we ask for confirming history 
of a change in clothing size or whether their clothes now fit 
very loosely. Finally, we ask for the pattern of the weight loss 
during the past few weeks (continued loss, stabilization, or 
gain). 

2. Dietary Intake in Relation to the Patient’s Usual Pattern 

Patients are classified as having either normal or abnormal 
(decreased) intake in the weeks to months before the exami¬ 


nation. The duration and degree of abnormality are also 
noted (eg, starvation, hypocaloric liquids, full liquid diet, or 
suboptimal solid diet). For example, patients with strokes 
resulting in swallowing difficulties may have been starved, 
simply receiving intravenous or hypocaloric fluids for several 
weeks before the examination. Patients with lesions that 
obstruct the outflow from the stomach, such as cancer or 
severe ulcers, may have been receiving pure liquid diets. In 
eliciting this history, we recommend asking patients whether 
their eating patterns have changed during the past few weeks 
and then ask if their pattern has changed during the past few 
months. Has the amount of food eaten decreased? If so, by 
how much? Are there certain kinds of foods that they used to 
eat that they can no longer eat? Why are they eating less 
(intentional reduction, unintentional reduction, ordered by 
clinician)? What happens if they try to eat more? Ask for an 
example of a typical breakfast, lunch, and dinner and a com¬ 
parison with typical meals 6 to 12 months ago. 
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3. Presence of Significant Gastrointestinal Symptoms: 
Anorexia, Nausea, Vomiting, and Diarrhea 

By significant we mean that these symptoms must have per¬ 
sisted on virtually a daily basis for a period longer than 2 
weeks. Short-term diarrhea or intermittent vomiting is not 
considered significant. Daily or twice-daily vomiting second¬ 
ary to obstruction is considered significant. 

4. The Patient’s Functional Capacity or Energy, 

Ranging From Full Capacity to Bedridden 

Patients who are unable to eat will often complain of fatigue 
and weakness to the point at which they are bedridden. 

There are 3 features of the physical examination that are 
recorded as normal (0), mild (1+), moderate (2+), or 
severe (3+). 

1. Loss of Subcutaneous Fat 

There are several locations where one can look for loss of 
subcutaneous fat, and the best are the triceps region of the 
arms, the midaxillary line at the costal margin, the 
interosseous and palmar areas of the hand, and the deltoid 
regions of the shoulder (Figures 28-1 and 28-2). Positive 
findings are loss of fullness or 1 or more areas where the 
skin fits too loosely over the deeper tissues; this latter sign 
may be falsely positive in elderly individuals who may 



Figure 28-1 Loss of Subcutaneous Tissue in the Arm and Chest Wall 



Figure 28-2 Loss of Subcutaneous Tissue Overlying the Fifth 
Metacarpal 

Hand with tissue loss (left) vs healthy hand (right). 


appear to have lost subcutaneous tissue without being clini¬ 
cally malnourished. 

2. Muscle Wasting 

The best muscles to examine are the quadriceps femoris and 
deltoids. In the deltoid region, malnourished patients have a 
squared-off appearance to their shoulders from the combina¬ 
tion of muscle and subcutaneous tissue loss (Figure 28-3). In 
severe malnutrition, the quadriceps will have loss of bulk and 
tone. Obviously, neurologic lesions (that may present with 
unilateral wasting) may produce false-positive findings here. 

3. Loss of Fluid From the Intravascular to Extravascular 
Space, Namely, Ankle or Sacral Edema and Ascites 

The first 2 signs are best assessed by inspection and then 
by palpation, remembering that some features are best 
inspected from a distance, eg, squared-off shoulders. Edema 
is assessed by pressing the ankle (leg) or sacrum, feeling 
the fluid move out of the subcutaneous tissue, and then 
observing “pitting,” persistent depression of the area pressed 
(more than 5 seconds). 

There is no explicit numeric weighting scheme described 
for combining these features of the history and physical 
examination into an SGA. Rather, they are combined subjec¬ 
tively into an overall or global assessment. In the study that 
established the precision and accuracy of SGA, 4 8 clinicians 
placed greatest importance on the following variables: weight 
loss of more than 10%, poor dietary intake, loss of subcuta¬ 
neous tissue, and muscle wasting. Patients suspected of being 
malnourished or judged to have moderate malnourishment 
(class B) had lost at least 5% of their body weight in the 
weeks before examination without stabilization or weight 
gain, had a definite history of reduction in dietary intake, 
and exhibited mild (1+) loss of subcutaneous tissue. When 
patients had considerable edema, ascites, or tumor mass, less 
attention was paid to the amount of weight loss. The other 
historical features helped the clinicians confirm the patient’s 
self-report of weight loss or dietary change but received less 
weight in the ranking system. 

If, on the other hand, a patient had a recent weight gain that 
did not appear to be merely fluid retention, clinicians designated 
that patient well nourished (class A), even if the net weight loss 
was between 5% and 10% and there was mild loss of subcutane¬ 
ous tissue. The assignment of a class A rank also should occur in 
settings in which the patient has had an improvement in the 
other historical features of SGA, such as appetite. 

To be classified as severely malnourished (class C), patients 
should demonstrate obvious physical signs of malnutrition, 
such as severe (3+) subcutaneous tissue loss and muscle 
wasting, often with edema, in the presence of a clear and 
convincing pattern of ongoing weight loss of at least 10%. 

By design, this system is less sensitive and more specific. 
That is, few well-nourished patients will receive a false¬ 
positive diagnosis of malnourishment, but some patients 
with mild degrees of malnutrition may be missed. 

Windsor and Hill 7 describe a slightly different system of 
nutritional status that focuses more on physiologic function. 
Their system has 2 components: weight loss and functional 
status. Preoperative percentage of weight loss is defined as 
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(recalled well weight minus current measured weight) divided 
by well weight. A weight loss of more than 10% during the pre¬ 
ceding 3 months was considered significant. Confirmation of 
weight loss is sought in the physical examination by palpating 
skin folds for loss of fat and muscles in a manner similar to 
that just described, functional impairment of overall activity 
levels (by observing the patient on the ward), overall mood 
(alertness, ability to concentrate, and irritability), skeletal 
muscle function (having the patient squeeze the examiner’s 
hand), respiratory function (effort and sound of coughing and 
shortness of breath), wound healing (unhealed wounds and 
sores or scratches or skin sepsis), and serum albumin level of 
less than 3.2 g/dL. If patients have weight loss of less than 10%, 
with no evidence of abnormal physiologic function, then they 
are placed in group 1. With weight loss of more than 10% but 
no abnormal physiologic function, patients are placed in 
group 2, and with both features, they are placed in group 3. 

READER PARTICIPATION 

Before you read further, we suggest that you return to the 
patient scenarios that opened this overview and decide 
whether you judge them to be well nourished, moderately 
malnourished, or severely malnourished using SGA. After 
doing so, read on. 

The patient in case 1 was moderately malnourished (class B). 
This ranking was determined by his continuing loss of weight, 
the limitation of nutritional intake to hypocaloric fluids for 2 
weeks, and the mild loss of subcutaneous tissue and muscle. 

The patient in case 2 was severely malnourished (class C). 
This judgment was most influenced by his continuing large 
weight loss, change in dietary intake, and positive physical 
findings. 

The patient in case 3 was well nourished (class A). Although 
he had experienced considerable weight loss at some time 
before admission, his weight had stabilized and increased just 
before admission. 

PRECISION OF THE ASSESSMENT 
OF NUTRITIONAL STATUS 

Investigators at the University of Toronto studied 202 
patients at 2 teaching hospitals who were undergoing major 
GI surgery. 8 A nurse and 3 residents learned the technique of 
nutritional status described herein by examining a series of 
patients and reviewing their assessments with those of a 
senior clinician. The emphasis was on combining the symp¬ 
toms and signs of malnutrition to minimize the false-positive 
diagnosis of malnutrition (high specificity) at the expense of 
increasing false-negative results (lower sensitivity). After 
reviewing several patients together, the nurse and one of the 
3 residents performed duplicate, independent assessments of 
109 patients. There was perfect agreement in 100 (91%) of 
109 patients on the SGA rankings. This was 78% above the 
agreement that could be expected by chance alone (the K sta¬ 
tistic was 0.78, with SE = 0.08 and 95% confidence interval 
ranging from 0.62 to 0.94). The K statistics for the 3 pairings 



Figure 28-3 Loss of Subcutaneous Tissue in the Shoulders, Giving a 
Squared-off Appearance 


of the nurse with the individual residents were 0.60,0.81, and 
1.0, respectively, revealing some variation in agreement 
between different clinicians. Hirsch et al 13 also documented 
79% concordance between SGA rankings of residents and 
specialists in clinical nutrition. 


ACCURACY OF NUTRITIONAL ASSESSMENT 

Because there is no criterion standard for the diagnosis of mal¬ 
nutrition that incorporates body composition and physiologic 
function (the in vivo neutron activation analysis and titrated 
water technique are the criterion standards of body composi¬ 
tion alone), studies of the accuracy of techniques of nutritional 
status assessment have related it to the development of compli¬ 
cations judged to result from malnutrition. Therefore, patients 
are sorted into the columns of the usual 2x2 table based on 
whether they develop malnutrition-associated complications. 

The study by Detsky et al 4 provides useful data on the accu¬ 
racy of SGA (Table 28-2). Nineteen patients (10% of the total 
studied) were classified as severely malnourished (class C), 44 
(21%) were classified as moderately (or suspected of being) mal¬ 
nourished (class B), and 139 (69%) were classified as well nour¬ 
ished (class A). The likelihood ratios in this table show that the 
SGA is a powerful predictor of postoperative complications. The 
likelihood ratio greater than 4 for severely malnourished 
patients means that this designation (class C) was more than 4 
times as likely to be found in patients with, as opposed to 
patients without, postoperative complications. Patients desig¬ 
nated as moderately (or suspected of being) malnourished (class 
B) generated a likelihood ratio of close to unity, indicating no 
clinically important change between the preexamination and 
postexamination probability of postoperative complications 
(20/202, or 10%). Finally, well-nourished patients (class A) gen¬ 
erated likelihood ratios of 0.66 for their admission SGA and only 
0.38 for their minimum SGA, indicating a lower than average 
risk of postoperative complications. 

SGA performed better than objective measurements of the 
physical examination, such as percentage of ideal weight on 
admission and percentage of body fat calculated from 
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Table 28-2 Relationship Between Subjective Global Assessment (SGA) and Major Postoperative Complications 3 

Major Complications 11 Likelihood Ratio for 

Patients Assigned Class on Occurring in This Class, No. Likelihood Ratio for Minimum SGA During 


SGA Class 

Admission, No. (%) 

(%) 

Admission SGA 

Hospitalization 


Severely malnourished 

19(1 Of 

8(42) 

4.4 

4.1 


Moderately (or suspected of 
being) malnourished 

44 (21) 

4(9) 

0.96 

0.93 


Well nourished 

139 (69) 

8(5) 

0.66 

0.38 


“Derived from Detsky et al. 4 


6 Of the 20 complications, there were 6 deaths related to sepsis, 2 nonfatal episodes of sepsis, 3 subphrenic or intra-abdominal abscesses, 2 anastomotic breakdowns, 2 wound 
dehiscences, and 5 major wound infections (abscesses). 


Table 28-3 Predictive Properties of Unpromising Techniques 3 


Likelihood Ratio 

Ideal Weight on Admission, % 

<79 

1.5 

80-99 

0.62 

>100 

1.2 

Admission Body Fat, % 

<9 

1.0 

10-14 

0.83 

>20 

0.99 


“Derived from Detsky et al. 4 


anthropometric measurements. The range of likelihood 
ratios for these variables displayed considerably less accuracy 
than those associated with SGA and the combination of SGA 

(Table 28-3). 

Laboratory determination of serum albumin level was also 
shown to be an accurate predictor of complications, associ¬ 
ated with a progression of likelihood ratios that is similar to 
that of SGA. Moreover, the combination of SGA and albu¬ 
min provided slightly improved accuracy compared with 
either method alone. However, other objective methods that 
are frequently said to be useful techniques of assessing nutri¬ 
tional status (serum transferrin level, creatinine-height 
index, and total lymphocyte count) were not shown to be 
accurate predictors of complications. 4 

The study by Windsor and Hill 7 provides similar data dem¬ 
onstrating the predictive validity of their system. Of the 102 
patients, 43 (42%) were in group 1 (analogous to SGA class 
A), 17 (17%) were in group 2 (analogous to SGA class B), 
and 42 (41%) were in group 3 (analogous to SGA class C). 
The rate of major complications, septic complications, and 
pneumonia in the 3 groups was significantly different, and 
the likelihood ratios for predicting major complications 
showed a similar progression to SGA of 0.53, 0.69, and 1.8 
for groups 1,2, and 3, respectively. 

Finally, the predictive validity of SGA was also reported by 
the Veterans Affairs perioperative total parenteral nutrition 
randomized trial 11 that enrolled only patients with various 


degrees of malnutrition. Among the control patients, those 
in SGA class C had higher rates of major infectious complica¬ 
tions and noninfectious complications. 

Some have also reported the high correlation between SGA 
and other measures of nutritional status assessments that are 
thought to be more objective, such as anthropometry, 3,13 
albumin level, 3,13 total serum protein level, 3,13 and criterion 
standard measures of body composition. Windsor and Hill 7 
also show good correlations between their system and 
anthropometry, body composition, and objective measures 
of the physiologic functions in their method (eg, grip 
strength and respiratory muscle index). 

ARE THESE SYMPTOMS OR SIGNS EVER NORMAL? 

Many individuals are thin, and this in itself does not consti¬ 
tute malnutrition. However, we should note that obesity, 
defined as an excess of adipose tissue or by the degree to 
which a patient’s weight exceeds that which is judged ideal by 
some anthropometric formula, is also a common problem in 
hospitalized patients. Epidemiologic studies have shown that 
a 20% excess over ideal weight imparts a health risk. Simi¬ 
larly, obesity has been shown to place patients at a high risk 
of experiencing surgical complications, such as poor wound 
healing and venous thrombosis. 

SPECIAL WAYS TO LEARN, TEST YOURSELF, 

AND CORRECT DEFICIENCIES IN THE ELICITATION 
OF THESE SYMPTOMS AND SIGNS 

Clinicians who wish to become competent at nutritional 
assessment can do so by applying the following strategies: 
First, they should undergo a training period with other 
learners, in which they discuss each of the features of the 
technique together and review a series of patients for each of 
the findings. In particular, the group should review methods 
of eliciting the medical history, performing the inspection, 
and standardizing terms such as normal, mild, moderate, and 
severe. Next, they should rank several patients together and 
reach consensus about what constitutes an A, B, or C rank¬ 
ing. Finally, they should perform their own tests of clinical 
reproducibility by treating a series of (perhaps 10) patients 
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independently and comparing their rankings. To improve the 
precision and validity of their elicitation of the individual 
features of the SGA, they should consider verification strate¬ 
gies, such as asking the patient’s spouse about the features of 
the history, examining physician records for previous 
weights, asking whether the patient’s clothes now fit loosely, 
and examining recent and old pictures of the patient. 

THE BOTTOM LINE 

Clinicians can learn to perform SGA of nutritional status 
with precision. The features of the medical history and phys¬ 
ical examination are shown in Table 28-1. We recommend 
the group approach to standardize the definitions of the fea¬ 
tures of the history and physical examination contained in 
SGA and to gain competency in their application. In doing 
so, we recommend that clinicians train themselves to be less 
sensitive and more specific in labeling patients as malnour¬ 
ished. Because there is no criterion standard for malnutrition 
that incorporates body composition and physiologic func¬ 
tion, this clinical skill should be used as a prognostic instru¬ 
ment to identify patients who are at high risk of developing 
complications and who may benefit from nutritional reple¬ 
tion and support. The technique is an accurate predictor of 
patients who are at higher risk of developing complications 
such as infection or poor wound healing. 
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CLINICAL SCENARIO 


A 68-year-old man with advanced emphysematous lung 
changes has been hospitalized 3 times during the past winter 
for episodes of dyspnea. His diet is not as good as usual, but 
his weight has not changed during the past 2 months. Because 
of his breathing difficulty, he spends much of his day in and 
out of a reclining chair. You notice that his arms are a bit thin, 
with loss of muscle. There is some peripheral edema. Overall, 
his weight is down about 2 kg from what he considers his 
baseline (a 3% loss). During his last hospitalization, a serum 
albumin level was 3.4 g/dL, and you see that his total lympho¬ 
cyte count was 1525 cells/pL. Is he appropriately nourished? 

UPDATED SUMMARY ON MALNUTRITION IN ADULTS 

Original Review 

Detsky AS, Smalley PS, Chang J. The rational clinical examina¬ 
tion: is this patient malnourished? JAMA. 1994;271(l):54-58. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combining the subject 
headings “malnutrition/di,” “protein-energy malnutrition/di,” 
and “nutritional disorders/di,” published in English from 1993 
to September 2004. The focus was on macronutrient rather 
than micronutrient deficiency (vitamins and minerals). The 
search yielded 96 tides for review, of which 39 articles appeared 
to have promising abstracts. Two nonsystematic reviews on 
malnutrition in the elderly helped us focus on simpler nutri¬ 
tional screening assessments, performed by physicians. We 
reviewed the reference lists from these 2 reviews. 1,2 

We reviewed studies of adults with more than 100 study 
subjects. We were interested only in original studies that pro¬ 
spectively assessed adult malnutrition compared with an 
appropriate reference standard and that contained data 
allowing us to estimate the sensitivity and specificity of clini¬ 
cal symptoms, signs, or screening instruments. In addition, 
we focused on screening instruments that are simple and 
require little additional training. We retained only 3 articles 


Prepared by David L. Simel, MD, MHS 
Reviewed by Alan Detsky, MD, PhD, and Amy Rosenthal, MD 


for detailed reviewed. We used a qualitative approach to sum 
up the main features of the other identified studies and non¬ 
systematic reviews. 3 

NEW FINDINGS 

Details of the Update 

The majority of studies on adult malnutrition include either the 
elderly subject (healthy, hospitalized, or institutionalized) or 
patients with malignancy. Some of the specific screening instru¬ 
ments for the elderly lack generalizability to other populations 
because they include questions concerning dementia and deficits 
in the activities of daily living that will be less of a concern in 
younger patients. 

Virtually all screening instruments emphasize the impor¬ 
tance of quantifying weight loss and assessing changes in appe¬ 
tite. In general, a change in weight of 5% is small, whereas a 
change more than 10% is definitely significant. However, clini¬ 
cal judgment is still required and can be highlighted by 2 sim¬ 
ple examples: (1) a patient undertaking a diet may have more 
than 10% weight loss and not be malnourished, or (2) a 
patient with cirrhotic ascites may be severely malnourished, 
despite a stable weight, when extracellular fluid replaces weight 
lost from decreasing muscle mass. 

Incorporation bias affected many of the newer studies of adult 
malnutrition because the results of the screening tests were also 
part of the reference standard. This seems inevitable in nutri¬ 
tional research because the reference standard requires the com¬ 
bination of objective findings (medical history, anthropometric 
measures, and biochemical measures) and clinical impression. 
In retaining articles for specific review, we identified those that 
seemed less affected by the bias. For example, one study com¬ 
pared a discriminant analysis equation using quantitative vari¬ 
ables. Although the variables might have been available for some 
patients, it seemed unlikely that the total score from the equation 
would have been readily available. 4 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

There are no changes in the performance characteristics of the 
recommended subjective global assessment (SGA) for adult 
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malnutrition. An additional study of observer variability for the 
SGA in a different patient population (women with gynecologic 
malignancies) found a weighted K of 0.80 (95% confidence 
interval [Cl], 0.67-0.92), 5 almost identical to that reported in the 
original publication (k = 0.78). This provides us with a high 
degree of confidence in the reliability of the SGA. 

CHANGES IN THE REFERENCE STANDARD 

No single test serves adequately as a reference standard for mal¬ 
nutrition. The assessment of adult malnutrition requires a com- 

Table 28-4 Likelihood Ratio of a Low Albumin Level for Malnutrition 

Reference 

Finding Standard LR+ (95% Cl) LR- (95% Cl) 

Serum albumin < 3.0 g/dL SGA 3.3(1.6-6.9) 0.88(079-0.95) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio; SGA, subjective global assessment. 


bination of the patient medical history, physical examination 
results, biochemical and anthropometric measures, and an 
expert’s opinion. From a pragmatic viewpoint and from the 
viewpoint of a clinical investigator, most physicians would accept 
the opinion of an expert (eg, a clinical dietitian or a physician 
with expertise in nutritional assessment) who used these vari¬ 
ables as part of their assessment. The reference standard for 
determining malnutrition goes beyond identifying the patient 
with current protein-energy deficiency. A more relevant issue is 
identifying the patient at risk for nutrition-related complications. 

In the original Rational Clinical Examination article on mal¬ 
nutrition, the SGA was proposed as the best screening measure 
(Table 28-1). Since publication of that article, newer studies have 
used the SGA itself as the reference standard in an attempt to 
find other screening approaches that require less expertise, fewer 
variables, or less time. This is a reasonable approach in that the 
SGA has been validated for its reliability 6 (k = 0.71 or higher) 
and accuracy in predicting outcomes from malnutrition. 


RESULTS OF LITERATURE REVIEW 


Table 28-5 Multivariate Findings for Adult Malnutrition 


Malnutrition Screening Tool' 2 ’ 13 

Item Score 

1. Have you lost weight without trying? 

No 

0 

Unsure 

2 

Yes 

Use question 2 instead 

2. If 1 is yes, use the question, How much weight 
(kg) have you lost? 

None 

0 

1-5 

1 

6-10 

2 

11-15 

3 

>15 

4 

Unsure 

2 

3. Have you been eating poorly because of a 
decreased appetite? 

No 

0 

Yes 

1 

Malnutrition screening score 

Sum of above 


Table 28-6 Likelihood Ratios of Combinations of Findings for Malnutrition 


Combination of 
Findings 

Reference 

Standard 

LR (Factor 
Present) 

LR 

(Factor Absent) 

Malnutrition screening 
tool (score > 2) 

(2 studies) 1243 

SGA 

13(2.9-61) 

0.27 

(0.19-0.39) 

LAW criteria (discrimi¬ 
nant function using 
lymphocyte count, 
albumin, percentage 
weight loss) (1 study) 4 

Expert assess¬ 
ment by a dietitian 

6.1 (4.0-9.6) 

0.10 

(0.03-0.25) 


Abbreviations: LAW criteria, /ymphocyte count, albumin, percentage weight loss; LR, 
likelihood ratio; SGA, subjective global assessment. 


The SGA was independently compared to a single biochemical 
measure, the serum albumin level, among a population of hos¬ 
pitalized, older, general medical patients. 7 A serum albumin level 
less than 3.0 g/dL increases the likelihood of moderate or severe 
malnutrition (likelihood ratio [LR], 3.3; 95% Cl, 1.6-6.9), 
although other conditions could be associated with hypoalbu- 
minemia ( able 28- ). However, using a value as extreme as 3.0 
g/dL will miss many patients, and the area under the receiver 
operating characteristic curve for albumin is only 0.58. These 
data support the continued use of the SGA for assessing patients. 

A patient-generated subjective global assessment (PG-SGA) 
has been developed. 6 This modification assigns values to explicit 
items on the physical examination, underlying conditions, met¬ 
abolic stress, and amount of weight loss. The hope for the PG- 
SGA was that less-experienced observers might be able to use it 
because the scores are explicit for the various items. Although 
the accuracy was high compared with the SGA, the PG-SGA 
requires an independent comparison to the SGA and assessment 
of its interobserver variability. Given the large number of items 
on the PG-SGA vs the SGA, it may have lower interobserver 
variability. 

Two shorter instruments have been developed and compared 
to the SGA, using the SGA as the reference standard ( 
and 28-6). The Malnutrition Screening Tool 12,13 shortens the 
SGA to the information collected in its first 3 questions. Sum¬ 
mary likelihood ratios (Table 28-6) suggest that it performs well. 
A second approach creates a score from a discriminant model 
that combines the percentage weight loss with the serum albu¬ 
min and the total lymphocyte count. 4 

For elderly patients, the Mini Nutritional Assessment (MNA) 
has been validated in a variety of ways and compared to phy¬ 
sicians with expertise in clinical nutrition as the reference 
standard, along with dietary changes, anthropometry, and 
biochemical measures. 8 The MNA has been applied to healthy, 
hospitalized, housebound, and institutionalized elderly patients. 
It requires about 10 minutes for an expert to complete the ques- 
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tionnaire. The screen shows excellent reliability, with a K of 0.78 
at a cut point of 18. 9 The items in the questionnaire limit the 
applicability to elderly patients. Before it can be accepted as a 
reference standard for elderly patients, additional work needs to 
be done. One study, using an independent, blind application of 
the MNA to a clinical expert, found it to be only 62% accurate. 10 
Clinicians who care primarily for a geriatric population may 
find useful a compilation of review articles from a symposium 
on the MNA in the elderly. 11 

LAW (Lymphocyte Count, Albumin, Percentage Weight 
Loss), Discriminant Model 4 

0.07242 x (total lymphocyte count, pL) 

+ 238.664 x (albumin, g/dL) 

- 24.657 (% weight change, expressed as 15% = 

15 rather than 0.15) 

= score_ 

Score < 747.2 = positive for malnutrition 
Score > 747.2 = negative for malnutrition 

EVIDENCE FROM GUIDELINES 

No government guidelines address a preferred screen for the 
nutritional assessment of adults. The Joint Commission 


requires nutritional assessment, when warranted by the 
patient’s condition, in all health care settings. 


CLINICAL SCENARIO—RESOLUTION 


A major goal of nutritional assessment in adults is not only 
diagnosing current protein-energy deficiency but also identi¬ 
fying the patient at risk of nutrition-associated complications. 

You have the data to use the LAW (lymphocyte count, 
albumin, percentage weight loss) criteria, but the discrimi¬ 
nant function gives you a value of 848 and does not indicate 
moderate or at-risk malnutrition. The single value of albu¬ 
min does not change the likelihood of malnutrition much 
because an albumin level more than 3.0 g/dL has an LR of 
only 0.88. These results expose the fallacy of relying too 
much on biochemical measures. The patient has lost weight 
attributable to a change in his appetite (malnutrition screen¬ 
ing score of 2), which puts him at risk for moderate malnu¬ 
trition. In addition, your physical examination results 
suggest the loss of muscle mass that could be quantified 
through caliper measurement of his triceps skinfold thick¬ 
ness. It is appropriate to use the items of the SGA that factor 
in his weight loss, change in diet, loss of functional capacity, 
and loss of subcutaneous fat in the triceps that together put 
him in a category of suspected moderate malnutrition. 


ADULT MALNUTRITION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The prior probability for adult malnutrition has a broad 
range. Among hospitalized medical or surgical patients, the 
prevalence is 10% to 40%. The prevalence among healthy 
patients, by definition, will be much lower. 

POPULATION FOR WHOM ADULT MALNUTRITION 
SHOULD BE CONSIDERED 

• Disorders, conditions, or treatments affecting appetite 

• Malignancy 

• Psychiatric illness 

• Gastrointestinal tract illness 

• Conditions requiring a change to a suboptimal solid diet 
(eg, liquid diets, tube diets) 

• Disorders affecting metabolism 

• Elderly patients 

• Patients with unintentional weight loss of more than 5%, 
a major category of individuals for whom additional test¬ 
ing is warranted 


IDENTIFYING THE MALNOURISHED ADULT 

Determine whether the patient has lost weight, the amount 
of weight loss, and his or her appetite to get a malnutrition 
score (Table 28-7). 


Table 28-7 Detecting the Likelihood of Adult Malnutrition 

LR+ (95% Cl) LR- (95% Cl) 

Malnutrition screening tool 3 (score >2) 13 (2.9-61) 0.27 (0.19-0.39) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 

likelihood ratio. 

“See Table 28-5 for components of the malnutrition screening tool. 

REFERENCE STANDARD TESTS 

• Expert evaluation (dietitian or physician trained in nutri¬ 
tional care and assessment) using a combination of his¬ 
torical features, anthropometry, weight change, and 
biochemical measures. 

• SGA by a trained clinician for identifying patients at risk 
of complications related to malnutrition. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Malnourishment, Adult 



TITLE Developing an Effective Adult Nutrition Screen¬ 
ing Tool for a Community Hospital. 

AUTHORS Elmore MF, Wagner DR, Knoll D, et al. 

CITATION J Am Diet Assoc. 1994;94(10):1113-1118. 

QUESTION Do 3 variables, identified through discrimi¬ 
nant analysis, predict malnutrition? 

DESIGN A 3-variable discriminant model (nutrition 
screening equation [NSEq]) was developed in one hospital 
and then tested prospectively in a second hospital. 

SETTING Community hospital. 

PATIENTS Randomly selected patients (n = 151) from a 
different hospital where the discriminant model was 
developed. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The serum albumin level and total lymphocyte count were 
obtained from the first scheduled blood draw for the patient 
after admission. The percentage of weight loss was by self- 
report of the patient. The reference standard was a full 
nutritional assessment that incorporated a history, review 
of systems, current status of the patient, and biochemical 
and anthropometric measures by a trained dietitian. For 
some patients, variables included in the model might have 
been available to the clinician. However, the full nutritional 
assessment and screening tests were applied independently 
to develop the model. The patients were assigned to levels of 
not at nutritional risk vs at risk. 

MAIN OUTCOME MEASURE 

Comparison of the discriminant model to the reference stan¬ 
dard diagnosis of malnutrition. 


Discriminant model: 

238.664 x (albumin, g/dL) 

+ 0.07242 x (total lymphocyte count, mm 3 ) 

- 24.657 (% weight change, expressed as 15% =15 

rather than 0.15) 

= score_ 

Score < 747.2 = positive for malnutrition 
Score > 747.2 = negative for malnutrition 

MAIN RESULTS 

The Nutrition Screening Equation ( able 28-8) can be 
remembered from the acronym LAW (lymphocyte count, 
albumin, percentage weight loss). 

Table 28-8 Likelihood Ratios for the Nutrition Screening Question for 
Malnutrition 

Test LR+ LR- DOR (95% Cl) 

NSEq 6.1 (4.0-9.6) 0.10(0.03-0.25) 64(18-220) 

Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio; NSEq, nutrition screening equation. 

CONCLUSION 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Score was developed in one setting and then 
validated in another. 

LIMITATIONS The full nutritional assessment by the expert 
clinician included biochemical values. 

The addition of the albumin to the subjective global assess¬ 
ment (SGA) studied by Detsky et al 1 found that the addition of 
the serum albumin to the SGA provided additional information 
that was better than the SGA alone or the albumin level alone. 
The addition of the total lymphocyte count was not useful when 
added to the SGA. The investigators used a reference standard 
for malnutrition that most primary care clinicians would accept. 
Although the investigators did not use the actual SGA, they used 
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a multimodal approach that incorporated the medical history, 
clinical evaluation, and biochemical and anthropometric mea¬ 
sures. We could not separate out the relative contribution of per¬ 
centage of weight loss vs the laboratory parameters. 

The actual discriminant model used variables previously 
shown as important in combination and given as the LAW 
criteria. 2 

In the presence of incorporation bias (ie, the biochemi¬ 
cal and weight loss characteristics were used as part of the 
reference standard), what is the value of this information? 
As in the studies by Ferguson et al, 3,4 these results tell us 
more about how the expert clinicians incorporated these 
characteristics into their assessment than the independent 
value of these variables. Some clinicians, especially those 
less versed in assessment of malnutrition, might choose to 
use these results to justify obtaining a serum albumin level 
and total lymphocyte count when they are considering the 
presence of malnutrition in a patient with weight loss. 
However, it can be easily inferred from the model that 
intentional weight loss could lead to false-positive model 
results. For that reason, the clinical variables in the Mal¬ 
nutrition Screening Tool (MST) developed by Ferguson et 
al 3 ' 4 make more sense. Indeed, the higher diagnostic odds 
ratio of the MST suggests a greater accuracy and supports 
the need for clinically assessing the context of the patient’s 
weight loss. 

REFERENCES FOR THE EVIDENCE 

1. Detsky AS, Smalley PS, Chang J. Is this patient malnourished? JAMA. 
1994;271(l):54-58. 

2. Omran ML, Morely JE. Assessment of protein energy malnutrition in 
older persons, part I: history, examination, body composition, and 
screening tools. Nutrition. 2000;16(l):50-63. 

3. Ferguson M, Capra S, Bauer J, Banks M. Development of a valid and 
reliable malnutrition screening tool for adult acute hospital patients. 
Nutrition. 1999;15(6):458-464. 

4. Ferguson ML, Bauer J, Gallagher B, Capra S, Christie DRH, Mason BR. 
Validation of a malnutrition screening tool for patients receiving radio¬ 
therapy. Australas Radiol. 1999;43(3):325-327. 

Reviewed by David L. Simel, MD, MHS 


TITLE Development of a Valid and Reliable Malnutrition 
Screening Tool for Adult Acute Hospital Patients. 

AUTHORS Ferguson M, Capra S, Bauer J, Banks M. 

CITATION Nutrition. 1999;15(6):458-464. 

QUESTION Can a brief screening tool be developed that 
has high accuracy compared with the subjective global 
assessment (SGA)? 

DESIGN Prospective convenience sample. A variety of 
questions served as candidate variables for a model that 
predicts the SGA. The SGA was not performed indepen¬ 
dently of the potential screening questions. 

SETTING Inpatients at a Brisbane, Australia, hospital. 

PATIENTS Adult, newly admitted, general medical 
patients (n = 408). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A multitude of individual questions were asked of the 
patient. The patients self-reported their heights and weights. 
An expert dietitian performed the SGA. 1 Once the model was 
developed, the interrater reliability of the model was tested 
by 2 of 3 observers in a prospective fashion. In addition, the 
results of the reduced model were compared with anthropo¬ 
metric and biochemical variables, along with hospital length 
of stay. The Malnutrition Screening Tool (MST) was com¬ 
pared with the SGA that was categorized into well-nourished 
vs moderately or severely malnourished. 

MAIN OUTCOME MEASURES 

Diagnostic accuracy of the MST compared with the SGA; 17% 
of the patients were moderately or severely malnourished. 


MAIN RESULTS—UNIVARIATE 

The simple questions performed well when interpreted in 
isolation ( ;e 28-9). 


Table 28-9 Likelihood Ratios for Simple Questions Compared With the 
Subjective Global Assessment Tool 3 

Questions 

Sensitivity 

Specificity 

LR+ 

LR- 

DOR 

Are you eating poorly because 
of a decreased appetite? 

0.87 

0.93 

12 

0.14 

89 

Has your appetite/food intake 
been less than usual lately? 

0.83 

0.90 

8.3 

0.19 

44 

Have you lost weight recently 
without trying? 

0.98 

0.83 

5.8 

0.02 

239 


Abbreviations: DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR-, nega¬ 
tive likelihood ratio. 

“Confidence intervals not provided. The likelihood ratio represented calculated val¬ 
ues from the sensitivity and specificity. Diagnostic odds ratios are all highly signifi¬ 
cantly different from 1, with P < .001. 
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When taken together, the simple questions form the MST 

(Table 28-10). 


Table 28-10 Malnutrition Screening Tool 

Item Score 

1. Have you lost weight without trying? 

No 

0 

Unsure 

2 

Yes 

Use question 2 instead 

2. If 1 is yes, use the question, How much weight (kg) have you lost? 

None 

0 

1-5 

1 

6-10 

2 

11-15 

3 

>15 

4 

Unsure 

2 

3. Have you been eating poorly because of a decreased appetite? 

No 

0 

Yes 

1 


MAIN RESULTS—MULTIVARIATE 

The MST, a shortened version of the SGA, has high accuracy 
for malnutrition ( >le 28- ). 

Table 28-11 Malnutrition Screening Tool Compared With the Subjective 
Global Assessment Tool 

LR+ (95% Cl) LR- (95% Cl) DOR (95% Cl) 
MST > 2 a 47(20-110) 0.28(0.19-0.38) 168(63-446) 

Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio; MST, Malnutrition Screening Tool. 
“Values represent those predicted by the model, using a cut point of > 2 to predict a 
subjective global assessment of moderately or severely malnourished. The area 
under the curve for overall accuracy for the model was 0.97 (95% Cl, 0.95-0.99). 

The interobserver variability was almost perfect, with a K = 
0.88 for the MST. The reduced questionnaire showed statisti¬ 
cally significant correlations with all the anthropometric vari¬ 
ables, all the biochemical variables (except for the total 
lymphocyte count), and the hospital length of stay (4.9 days 
for well-nourished vs 9.5 days for those at risk of malnutrition; 
P< .001). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Large patient population. The interobserver 
variability was assessed, providing confidence that the tool is 
reproducible. The study gives some insight into how the cli¬ 
nician might intuitively weight the variables of the SGA. 

LIMITATIONS The same person collecting the candidate 
nutritional screening questions also did the SGA. 


Some investigators have used the SGA as the reference stan¬ 
dard for malnutrition in adult inpatients. The SGA combines 
features of the patient medical history and the physical examina¬ 
tion with the clinical assessment to sort patients into well- 
nourished, moderately malnourished, or severely malnour¬ 
ished. Because it requires training and good judgment, it is rea¬ 
sonable to assess whether a smaller set of questions might 
convey the same answer as the SGA. The multivariate model 
had a diagnostic odds ratio that was not as good as the single 
question alone about unintended weight loss (239 for single 
question vs 169 for the model). The information in the 3 ques¬ 
tions of the MST contains similar information to the first 3 
questions of the SGA—we can infer that the SGA is highly 
dependent on these questions. As a data reduction step for less 
trained clinicians, it makes obvious sense to ask the adult gen¬ 
eral medical inpatient whether he or she has had unintended 
weight loss and how much, or a decreased appetite. This simple 
tool requires validation in adult outpatients who should have a 
lower prevalence of being at risk of malnutrition than inpatients. 

REFERENCE FOR THE EVIDENCE 

1. Detsky AS, Smalley PS, Chang J. Is this patient malnourished? JAMA. 

1994;271(l):54-58. 

Reviewed by David L. Simel, MD, MHS 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The MST was performed by separate dietitians, independent 
of the SGA. The MST developed in this study was compared 


TITLE Validation of a Malnutrition Screening Tool 
(MST) for Patients Receiving Radiotherapy. 

AUTHORS Ferguson M, Bauer J, Capra S, Christie DRH, 
Mason BR. 

CITATION Australas Radiol. 1999;43(3):325-327. 

QUESTION Does a brief malnutrition screening tool 
(MST) compare well with the subjective global assessment 
(SGA) for assessing malnourishment? 

DESIGN Prospective, independent sample of all patients 
on designated study days. 

SETTING Radiotherapy center at 2 Australian hospitals. 

PATIENTS All adult patients who were undergoing 
radiotherapy on designated study days and agreed to par¬ 
ticipate (n = 106). The patients had a variety of sites 
affected by carcinoma—32% breast, 19% prostate, 11% 
gastrointestinal, 9% head and neck—and the rest had of a 
variety of sites. Only 14 patients declined participation 
(enrollment rate, 88% of potentially eligible patients). 
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with the SGA that categorized patients into well nourished vs 
moderately or severely malnourished. 

MAIN OUTCOME MEASURE 

Diagnostic accuracy of a clinical prediction model compared 
with the SGA. 

MAIN RESULTS 

The MST, a shortened version of the SGA, has high accuracy 
for malnutrition (1 able 28- 12). 

Table 28-12 Malnutrition Screening Tool Compared With the 
Subjective Global Assessment Tool 

Test LR+ (95% Cl) LR- (95% Cl) DOR (95% Cl) 

Malnutrition Screen- 5.2 (3.2-7.5) 0(0-0.76) 101(9.6-1074) 

ing Tool score > 2 

Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 


STRENGTHS The MST was applied by a separate dietitian, 
independently of the SGA. Lower prevalence of malnourish- 
ment compared with that of adult inpatients. 

LIMITATIONS Homogenous patient population (cancer 
patients), although the mix likely included inpatients and 
outpatients. 

This same group of authors developed the MST, and in this 
study, they applied the tool to a different group of patients. 
Compared with the model development study and valida¬ 
tion, this group of patients had a lower prevalence of malnu¬ 
trition (making them more comparable to outpatients rather 
than inpatients). A strength of this study is the independent 
application of the MST and SGA. The results confirm the 
diagnostic accuracy of the MST and suggest that the ques¬ 
tions in the SGA pertaining to the amount of weight loss and 
anorexia carry a large amount of the information. It seems 
intuitive that patients without weight loss and without 
diminished appetite are less likely to be malnourished, 
although these data show that seemingly normal weight and 
appetite do not rule out malnourishment. The results do 
help clarify the magnitude of the information provided by 
these simple questions. The screen requires validation in gen¬ 
eral medical outpatients to assess its generalizability to a less 
sick population. 

Reviewed by David L. Simel, MD, MHS 
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A patient presenting to your office informs you that he is 
concerned about a mole on his arm. Although he is not 
sure how long the mole has been present, he tells you that 
recently it has enlarged and looks different. As you begin 
the examination, you also notice the presence of several 
other moles and ask yourself, is this lesion a benign mole 
or a malignant melanoma? 



Does This Patient Have a BSM 

Mole or a Melanoma? Epidemiology 


The incidence rate of malignant melanoma, once considered a 
rare malignancy, has increased dramatically in recent decades. 
In 1930, the lifetime risk of an individual in the United States 
developing melanoma was 1 in 1500. Estimates placed the life¬ 
time risk in 1996 at 1 in 87, with 1 in 75 by 2000. 1 This increased 
incidence is important because, unlike the more common non¬ 
melanoma skin cancers (basal cell carcinoma and squamous cell 
carcinoma), melanoma is much more likely to cause death. Six 
of 7 skin cancer deaths are from melanoma. 2 Although risk gen¬ 
erally increases with age, melanoma often occurs in young 
adulthood. The median age of onset for superficial spreading 
melanoma, which is by far the most common type of mela¬ 
noma, is 44 years. 3 Thus, a deadly melanoma can strike during 
early adulthood, resulting in decades of potential life lost. 

Early Detection and Prognosis 

Metastatic potential and death from melanoma are related to 
the tumor’s level of invasion. The prognosis of melanoma is 
approximated by relating it to the thickness of the tumor at 
excision. Melanoma that is confined to the epidermis (in situ) 
is greater than 99% curable. 4 Patients with thin lesions (thick¬ 
ness < 0.75 mm) have a 5-year survival rate of greater than 
98%, whereas those with thicker lesions (> 4 mm) have a less 
than 50% survival rate. 5 The prognosis is grim for metastatic 
disease. Nodal metastatic disease has a 36% 5-year survival 
rate, which decreases to only 5% with the presence of distant 
metastases. 6 Thus, the importance of the physical examination 
is clear: If thin melanomas are detected and excised, a cure is 
likely, whereas undetected progression of the tumor markedly 
decreases a patient’s chance of survival. 


ANATOMIC AND PHYSIOLOGIC ORIGINS 
OF THE SIGNS AND SYMPTOMS USED 
TO EXAMINE THE SKIN FOR MELANOMA 


Benign moles and melanoma arise from a cell normally 
present in the basal layer of the epidermis, called a melanocyte. 
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The melanocyte produces melanin, which results in skin pig¬ 
mentation. Alterations in melanin production result in dif¬ 
ferent pigmentary characteristics. 

Whereas melanocytes normally exist as solitary units, in 
moles or nevi, they exist as collections of cells. These include 
junctional nevi, which are grouped collections of epidermal 
melanocytes; compound nevi, which are epidermal and der¬ 
mal collections of melanocytes; and intradermal nevi, which 
are dermal collections of melanocytes. There also exists a 
spectrum of nevi that have various degrees of atypia, termed 
atypical or dysplastic nevi. 

Melanomas lose normal growth controls, change their fea¬ 
tures, and tend to grow in an irregular manner, leading to 
asymmetry, irregular borders, and haphazard coloration. 
This contrasts with benign nevi, which are characteristically 
more stable, more symmetric, have well-defined borders, and 
have even color distribution. However, these features are not 
absolute, and caution is warranted, particularly when change 
has been noted. 

HOW TO EXAMINE THE SKIN FOR MELANOMA 

Historical Feature Assessment 

History plays an important role in the examination of the skin 
for melanoma. Patients should be asked whether they have 
noted any lesions of concern, particularly any new moles or a 
change in size, shape, color, or sensation of a preexisting mole. 
This is critical information because approximately one-half of 
melanomas are initially discovered by the patient. 7 Changes in 
size or color are the 2 most common patient-reported features 
of melanoma. 810 Bleeding, tenderness or pain, and itching are 
also reported, although these features occur in more invasive 
lesions. 8 Patients should also be asked about a personal or 
family history of melanoma. The results of previous skin biop¬ 
sies and any history of nonmelanoma skin cancer should also 
be assessed. The patient’s tendency to sunburn and a history 
of sunburns may also help assess risk. The presence of focal or 
systemic symptoms or the presence of any lumps or bumps 
under the skin should be addressed, particularly in a patient 
with a history of a cutaneous malignancy. 

Physical Examination Technique 

To examine for melanoma, the entire skin surface should be 
inspected. Melanoma can occur anywhere on the skin and 
may develop in sun-protected areas. Patients who undergo 
complete cutaneous examinations are 6.4 times more likely 
to have a melanoma detected than patients receiving only a 
partial examination. 11 The patient should be examined head 
to toe in a well-lit room. A gown may be used and removed 
incrementally to evaluate various regions of the patient’s 
entire body surface. The examination of the patient’s scalp 
may be aided by sequentially parting the hair or by the use of 
a handheld hair dryer. The oral mucosa, genital area, nails, 
and the skin between the toes should be included in the 
inspection for evidence of pigmented lesions. 


When the patient is examined, it is also important to make 
a global assessment of his or her skin. For example, if your 
patient has multiple nevi, those nevi may have relatively uni¬ 
form characteristics. However, if one of the moles has 
unusual features that are dissimilar to those of other nevi, 
that lesion should be more closely examined. In the same 
manner, a single pigmented lesion occurring in a patient 
without other nevi should be evaluated with an increased 
level of concern. Patients at high risk for melanoma appear to 
be those with numerous nevi, those with nevi with atypical 
features, and particularly those with a family history of mela¬ 
noma. 1214 In patients with an increased risk of melanoma, 
regular skin examinations may result in melanoma detection 
at an earlier, thinner stage. 1518 

Checklists as a Diagnostic Aid 

In the United States, the ABCD checklist for detecting cuta¬ 
neous melanoma is recommended as a means for distin¬ 
guishing benign lesions from melanoma. 19 The criteria 
making up the ABCD checklist are all physical examination 
features: (1) when the lesion is bisected, half is not identical 
to the other half: Asymmetry (A); (2) when the border is 
uneven or ragged as opposed to smooth and straight: border 
irregularity (B); (3) when more than 1 shade of pigment is 
present: color variegation (C); and (4) when the lesion is 
greater than 6 mm in diameter (D). 

Lesions that have these features should raise suspicions of a 
melanoma. Friedman et al 19 state that, although not incorpo¬ 
rated into the ABCD checklist, historical features of a chang¬ 
ing, preexisting pigmented nevus or the development of a 
new pigmented lesion should alert the physician to the possi¬ 
bility of malignant melanoma. An amendment to the check¬ 
list that adds an (E), representing an elevation above the skin 
surface, was proposed in 1988. 20 See the Update to this chap¬ 
ter for additional details. 

A second checklist is the revised 7-point checklist used in 
the United Kingdom. 21 Three major criteria, all historical fea¬ 
tures, and 4 minor criteria, primarily physical examination 
features, are used to evaluate lesions suggestive of melanoma. 
The checklist was developed mainly for use by primary care 
physicians to assist them in making referral decisions. The 
major criteria are change in size, shape, and color; the minor 
criteria are inflammation, crusting or bleeding, sensory 
change, and a diameter 7 mm or greater. 

One interpretation of this guideline states that the major 
criteria are the basis for determining referral decisions. 21 
Any patient with at least 1 major sign should be referred to 
a dermatologist. The revised 7-point checklist has also 
been given a slightly different interpretation, with change 
in shape replaced by irregular shape (or appearance of 
irregularity in an old lesion), change in color replaced by 
irregular color, and a greater importance placed on the 
minor criteria. 22,23 A scoring system assigns 2 points for 
each major criterion and 1 point for each minor criterion. 
If a score of 3 points or more is noted, then a referral for 
lesion evaluation was suggested. 
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Criterion Standard for Diagnosing Melanoma 

The criterion standard for the diagnosis of melanoma is the 
histopathologic evaluation of excised tissue. 

METHODS 

Search Strategy and Quality Filter 

A literature search was performed using MEDLINE for 1966 
through 1996. Medical Subject Heading terms “melanoma” 
and “skin neoplasms” were combined with “physical examina¬ 
tion,” “sensitivity,” “specificity,” “observer variation,” “mass 
screening,” and “self-examination,” yielding approximately 713 
citations. In addition, a MEDLINE search was performed with 
the search strategy developed for this series of articles, which 
yielded 659 citations. Titles, abstracts, and relevant articles 
were reviewed in their entirety. Current Contents (Institute for 
Scientific Information) were reviewed with the terms “mela¬ 
noma,” “skin cancer,” and “mass screening” to search for more 
current articles. References for articles found by the search 
strategy and other manuscripts pertaining to melanoma were 
systematically reviewed for additional literature sources. 

The quality of the published articles was evaluated as previ¬ 
ously described. 24 For studies that assessed accuracy, 20 arti¬ 
cles were reviewed. The 95% confidence intervals (CIs) 
reported here, when not reported in the original articles, were 
calculated from the available data when possible for test per¬ 
formance characteristics. Studies were included if the level of 
evidence was graded as C or above. Lack of independence 
between the reference standard and the test, leading to verifi¬ 
cation bias, occurs in the existing literature. Another method- 
ologic issue relates to the nature of the reference standard, 
namely, histologic tissue obtained by biopsy. The decision to 
perform a skin biopsy requires clinical judgment because a 
biopsy specimen is not obtained for all patients with skin 
lesions. This requires using follow-up examinations, multiple 
examiners, or even consensus opinion to ascertain the diag¬ 
nosis. No existing studies were given a quality score of A or B; 
thus, all 12 studies graded as C were included. 


RESULTS 

Precision of the Skin Examination for Melanoma 

Two studies evaluated examiners’ precision for specific fea¬ 
tures of benign pigmented lesions, which include 4 of the 
features found in the ABCD(E) checklist. Physicians examin¬ 
ing the most atypical pigmented lesion found on patients 
recently diagnosed as having malignant melanoma displayed 
a moderate level of interobserver agreement. 25 Among 3 
examiners (medical oncologist, internist/epidemiologist, and 
dermatologist/dermatopathologist), the intraclass correla¬ 
tion coefficient was highest for degree of macularity, corre¬ 
sponding with elevation (E) at 0.56, asymmetry (A) at 0.46, 
haphazard color (C) at 0.44, and border irregularity (B) at 
0.40. A second study assessed interobserver and intraob¬ 
server agreement among 3 physicians using photographs of 
melanocytic nevi. 26 After establishing criteria to be assessed 
for each feature, the level of agreement was similar to that 
found in the first study, although less precision was noted for 
rating asymmetry. Interobserver agreement for physician 
pairs, as measured by the K statistic, ranged from 0.41 to 0.55 
for macular vs papular lesions (E), 0.38 to 0.53 for color var¬ 
iegation (C), 0.29 to 0.53 for border irregularity (B), and 0.05 
to 0.26 for contour irregularity (A). The level of intraob¬ 
server agreement was, overall, similar to interobserver agree¬ 
ment. However, intraobserver agreement was measured 
according to a 4-point scale, which graded the degree of each 
feature, rather than the presence or absence of each feature. 
These precision estimates are considered fair to moderate. 27 
Because only benign pigmented lesions were assessed, 
observer agreement for these features found in actual malig¬ 
nant melanoma lesions may be higher than reported in these 
studies. Precision estimates for global assessments of the 
presence or absence of melanoma are not available. 

Accuracy of Skin Examination for Melanoma With 
ABCD(E) and Revised 7-Point Checklists 

Two studies have assessed accuracy of the ABCD(E) checklist 
(Table 29-1). Different features of the checklist were assessed, 


Table 29-1 Operating Characteristics for the ABCD(E) Checklist 

No. With Disease/ 

Source, y Examiners Setting No. Without Disease 

Sensitivity, % 
(95% Cl) 

Specificity, % 
(95% Cl) 

LR for a 
Positive Test 
Result 
(95% Cl) 

LR for a 
Negative Test 
Result 
(95% Cl) 

Healsmith et 
al, 28 1994 s 

Dermatologists 

Pigmented lesion 
clinic 

65/0 

92 (82-96) 

b 



McGovern and 
Litaker, 29 1992 s 

Chart review of der¬ 
matologists’ exami¬ 
nations 

Dermatology 

clinic 

6/186 

100 (54-100) 

98 (95-99) 

62 (19-170) 

0 (0-0.5) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

a A positive test result required the lesion to have 1 or more of the ABCD(E) criteria: A, symmetry; B, border irregularity; C, irregular color; D, diameter greater than 6 mm; and E, 
elevation. 

“Ellipses indicate data not available. 

C A positive test result required the lesion to have border irregularity (B), color irregularity (C), and diameter greater than 6 mm (D). 
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and the interpretation of a positive test result was not the 
same in both studies. Features of the ABCD(E) checklist were 
prospectively recorded for patients with pigmented lesions 
who had been referred to a clinic for pigmented lesions. 28 A 
total of 65 histologically confirmed melanomas were included 
in the analysis. Only 5 lesions were not identified by the 
ABCD(E) portion of the checklist, resulting in a sensitivity of 
92% (95% Cl, 82%-96%). The ABCD(E) checklist was con¬ 
sidered positive when a lesion had 1 or more of the 5 fea¬ 
tures. Specificity was not reported. 

A second study used a retrospective design to assess 3 fea¬ 
tures of the ABCD(E) checklist; border irregularity (B), color 
variegation (C), and diameter (D). 29 The checklist was 
applied by reviewing charts and pathology reports among 
patients who had undergone biopsies of pigmented lesions 
during a 1-year period. Pigmented lesion biopsy specimens 
were included when the dermatologist’s pathology submis¬ 
sion form indicated clinical diagnoses of dysplasia, lentigo 
maligna, or malignant melanoma. All 6 histologically con¬ 
firmed melanomas had all 3 features on the checklist. The 
sensitivity was therefore 100% (95% Cl, 54%-100%). Only 3 
lesions that were benign had all 3 features, resulting in a 
specificity of 98% (95% Cl, 95%-99%). 

In the study by Healsmith et al, 28 all 5 of the melanomas 
that were not identified had a diameter of less than 6 mm, 
although a change in size was observed. 28 Because of concerns 
that requiring lesions to be larger than 6 mm may lower the 
sensitivity of the ABCD(E) checklist, resulting in missed 
lesions, 1150 melanomas that underwent biopsy during a 27- 
month period in Australia were retrospectively analyzed for 
their size. 30 Three hundred fifty-eight (31%) of 1150 of the 
melanomas were 6 mm or less in diameter. This indicates 
that requiring a diameter of greater than 6 mm in this sample 
of lesions would have lowered the sensitivity considerably. 

More data exist for differentiating benign lesions from 
melanoma with the revised 7-point checklist than with the 
ABCD(E) checklist (Table 29-2). The revised 7-point check¬ 
list was also analyzed against the 65 histologically confirmed 
melanomas that were found during a 38-month period in the 
aforementioned prospective analysis by Healsmith et al. 28 


This same checklist was applied to 100 randomly selected 
benign pigmented lesions—68 were considered benign 
according to clinical characteristics and 32 were histologi¬ 
cally confirmed to be benign. The sensitivity of the revised 7- 
point checklist was 100% (95% Cl, 94%-100%), because all 
melanomas had at least 1 major feature. The specificity was 
lower, at only 37% (95% Cl, 28%-46%). 

A second study reported the sensitivity of the revised 7- 
point checklist applied to 100 patients with histologically 
proven malignant melanoma to be 79% (95% Cl, 70%- 
85%). 22 In this study, the alternative interpretation of the 
revised 7-point checklist was used; features of the checklist 
were assigned scores, 2 points for each major feature and 1 
point for each minor feature present. A score of 3 or more 
was considered a lesion that should be referred to a derma¬ 
tologist because of its malignant potential. The checklist was 
prospectively applied to patients presenting to a clinic for 
pigmented lesions with lesions suggestive of melanoma. 

The specificity of the revised 7-point checklist, again using 
the scoring system, was estimated by applying it to a consec¬ 
utive series of 100 histologically benign lesions. 23 Seventy of 
the benign lesions achieved a score indicative of malignancy, 
resulting in a specificity of 30% (95% Cl, 21%-39%). This is 
the only study that has assessed accuracy of patient assess¬ 
ments, reporting a specificity comparable to the physician 
evaluations of 32% (95% Cl, 23%-41%). 

Studies assessing accuracy of the checklists have not 
applied and interpreted the criterion standard independently 
with the checklists and should be interpreted with some dis¬ 
cretion. Additionally, both the revised 7-point checklist and 
the ABCD(E) checklist have been subject to various interpre¬ 
tations of the requirements for positive and negative test 
results. With that in mind, existing evidence suggests that 
both checklists result in a sensitive diagnostic test. A highly 
sensitive test is desirable for a disease such as melanoma, 
which if left undetected can result in death. When the 
ABCD(E) checklist is used, requiring a lesion to be greater 
than 6 mm in diameter may lower the sensitivity, which 
could result in missed lesions. It appears that the checklists’ 
high sensitivity may come at the expense of low specificity, 


Table 29-2 Operating Characteristics for the Revised 7-Point Checklist 





Source, y 

Examiners 

Setting 

No. With Disease/ 
No. Without 
Disease 

Sensitivity, % 
(95% Cl) 

Specificity, % 
(95% Cl) 

LR for a 
Positive Test 
Result (95% Cl) 

LR for a Negative 
Test Result 
(95% Cl) 

Healsmith et al, 28 
1994 s 

Dermatologists 

Pigmented 
lesion clinic 

65/100 

100(94-100) 

37.0 (28-46) 

1.6 (1.4-1.9) 

0 (0-0.2) 

Du Vivier et al, 22 
1991 8 

Dermatologists 

Pigmented 
lesion clinic 

100/0 

79 (70-85) 

C 



Higgins et al, 23 
1992° 

Dermatologists 

Pigmented 
lesion clinic 

0/100 


30 (21-39) 




Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

a A positive test result required the presence of 1 major feature: change in size, change in color, or change in shape. 

b A positive test result required a score of 3 points, with 2 points being assigned to a major criterion (change in size, irregular shape, or irregular color) and 1 point for a minor cri¬ 
terion (presence of inflammation, diameter >7 mm, crusting or bleeding, and sensory change). 

'Ellipses indicate data not available. 
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especially when the revised 7-point checklist is used. No con¬ 
clusions can be drawn about the specificity of the ABCD(E) 
checklist from the available data. 

Accuracy for Detecting the Presence 
or Absence of Melanoma 

Accuracy studies of global assessments for detecting melanoma 
use 2 methods of examination: actual patient examination and 
image evaluation through the use of pictures, slides, or digi¬ 
tized images of lesions. Accuracy assessments have 
included dermatologist and nondermatologist examiners. 

Dermatologists have primarily been the examiners in 
studies using patient examinations. Estimates for sensitivity 
range widely from 50% to 97%, whereas specificity esti¬ 
mates have been more consistent, ranging from 96% to 
99% (Table 29-3). 3135 Data from existing studies generally 
do not allow for calculation of likelihood ratios. However, 
the positive predictive value, although influenced by preva¬ 
lence, is often reported. In the largest series of patients fol¬ 
lowed after completion of a screening skin examination, the 
positive predictive value was found to be 17%. 36 Other esti¬ 
mates of the positive predictive value vary greatly, from 
35% to 86%. 32 ' 34,37 

Lesions presented as pictures, slides, and digitized com¬ 
puter images rather than patient evaluations have been an 
alternative mode of evaluation used to assess accuracy and 
have often been used to compare nondermatologists’ exami¬ 
nations to those performed by dermatologists (Tables 29-4 
and 29-5). One study presented melanoma lesions in both a 
35-mm slide and a digitized computer image format to non¬ 
dermatologists (general internal medicine and family practice 


residents) and dermatologists (resident and attending physi¬ 
cians). 38 Nondermatologists provided the correct diagnosis 
60% of the time compared with dermatologists, who cor¬ 
rectly diagnosed the lesions 74% of the time, a difference 
that was statistically significant. The correct treatment 
option (defined as recognition of the need for a biopsy and 
the type of biopsy required) was selected by nondermatolo¬ 
gists significantly less often (52%) than by dermatologists 
(67%). The correct diagnosis and treatment options were 
determined by biopsy results and consensus opinion of 2 
dermatologists. In another study that compared nonderma¬ 
tologist examiners (first-year internal medicine residents 
and practicing physicians) to dermatologists (third-year resi¬ 
dents and practicing dermatologists), 100% of the dermatol¬ 
ogists correctly identified at least 3 of the 6 melanoma 
lesions compared with 70% of the nondermatologists. 39 In 
an additional study that compared practicing primary care 
physicians and internal medicine residents with dermatology 
faculty, 88% of the nondermatologists correctly identified 
melanoma compared with 100% of the dermatologist exam¬ 
iners. 40 General practitioners in Australia who were shown 
pictures, which included 2 early melanoma lesions and 1 late 
melanoma lesion, correctly identified all 3 as melanoma 41% 
of the time. 41 However, they made the correct decision to 
perform a biopsy on the lesion 83% of the time. Another 
study of similar design from New Zealand found that a high 
proportion of correct diagnoses and biopsy decisions was 
made by general practitioners. 42 A correct diagnosis was 
made by general practitioners in 81% of cases compared 
with 90% by dermatologists. Recognizing the need for a 
biopsy was similar for both groups, with the correct biopsy 
decision being made greater than 95% of the time. 


Table 29-3 Operating Characteristics for Global Assessments of the Presence or Absence of Melanoma 



Source, y 

Examiners 

Setting 

No. With Disease/No. 
Without Disease 

Sensitivity, % 
(95% Cl) 

Specificity, % 
(95% Cl) 

LR for a 
Positive Test 
Result 
(95% Cl) 

LR for a 
Negative 

Test Result 
(95% Cl) 

DeCoste and 
Sterm, 31 1993 

Pathology report review 
of specimens submit¬ 
ted by dermatologists 

Dermatology clinic 

Unknown 

50 

a 



Grin et al, 32 1990 

Pathology report review 
of specimens submit¬ 
ted by dermatologists 

Oncology section/ 
skin cancer unit 

265/10436 

81 (75-85) 

99.2(99.1-99.4) 

107(85-134) 

0.2(0.15- 

0.25) 

Koh et al, 33 1990 

Dermatologists 

Melanoma/skin 
cancer screening 
clinic 

9/0 

97 b 




McMullan and 
Hubener, 34 1956 

Pathology report review 
of specimens submit¬ 
ted by dermatologists 

Unknown 

87/0 

51 (40-60) 




Curley et al, 35 
1989 

Physicians experi¬ 
enced in managing 
melanocytic lesions 

Pigmented lesion 
clinic 

3/114 


96-99“ 




Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“Ellipses indicate data not available. 

“Calculated from estimated false-negative rates. 

“Range of results from 3 examiners. 
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Table 29-4 Proportion of Correct Diagnoses by Specialty 

No. of Unique 

Nondermatologist 

Participants 

Dermatologist 

Participants 

Correct Diagnosis, % 

Source, y 

ruimai ui Lcbiuii 

Presentation 

ivieiaiiumaLtisiuiiis 

Reviewed 

Nondermatologists 

Dermatologists 

Gerbert et al, 38 
1996 

35-mm Slides and 
digitized computer 
images 

12 

Internal medicine and family 
practice residents 

Dermatology resi¬ 
dents and attending 
physicians 

60 

74 

Cassileth et al, 39 
1986 

35-mm Slides 

6 

Medical students, internal 
medicine residents, fel¬ 
lows, and practicing physi¬ 
cians 

Dermatology resi¬ 
dents and practicing 
physicians 

70“ 

100“ 

Ramsay and 
Fox, 40 1981 

Slides 

1 

Participants of review 
courses and internal medi¬ 
cine residents 

Dermatology resi¬ 
dents and faculty 

88 

100 

Paine et al, 41 
1994 

Photographs 

3 

General practitioners 
(Australia) 

b 

41 


McGee et al, 42 
1994 

Photographs 

3 

General practitioners 
(New Zealand) 

Physicians registered 
as dermatologists 

81 

90 


“Proportion correctly identifying at least 3 of 6 lesions. 
“Ellipses indicate data not available. 


Specificity is high, at least among dermatologist examin¬ 
ers, when patient populations are examined for melanoma. 
The sensitivity of patient examinations for melanoma is less 
clear, and better study designs are needed to provide more 
accurate sensitivity assessments. Studies that have used 
images of lesions rather than patient examinations have indi¬ 
cated that nondermatologists’ examinations are less sensitive 
than examinations performed by dermatologists. Studies 
using examinations on patient populations, with rigorous 
application of the test and classification of the disease state, 
which include dermatologists and nondermatologists as 
examiners, are needed to provide better assessments of oper¬ 
ating characteristics. 

THE BOTTOM LINE 

Returning to the clinical scenario, a concerned patient pre¬ 
sents with an enlarging mole on his arm that has changed in 
appearance. According to the existing literature, the usefulness 


of the ABCD(E) checklist or revised 7-point checklist to dis¬ 
tinguish melanoma from benign lesions is not fully estab¬ 
lished. If a positive test result does not require that all 4 
features of the ABCD(E) checklist be present, misdiagnosing 
a melanoma as a benign lesion appears to be unlikely. The 
accuracy of using the ABCD(E) checklist to predict the dis¬ 
ease state when a positive test result requires the presence of 
all 4 features has not been described. However, early mela¬ 
noma lesions may be small (<6 mm in diameter), and requir¬ 
ing a lesion to be greater than 6 mm in diameter when using 
the checklist may result in some early lesions to be falsely 
classified as benign. It is unclear how often benign lesions 
would be considered to have malignant potential with the 
ABCD(E) checklist. The patient’s report of the lesion enlarg¬ 
ing and changing in appearance incorporates the primary 
criteria used in the revised 7-point checklist. When the 
revised 7-point checklist is used, misdiagnosing a melanoma 
as benign would also be unlikely, although it appears the 
checklist may classify many benign lesions as malignant. 


Table 29-5 Proportion of Correct Treatment Options by Specialty 

No. of Unique 

Melanoma 

Format of 1 esinn 1 esinns Nnnrlermatnlnnist 

Dermatologist 

Participants 

Correct Treatment, % 

Source, y 

Presentation 

Reviewed 

Participants 

Nondermatologists 

Dermatologists 

Gerbert et al, 38 
1996 

35-mm Slides and digi¬ 
tized computer images 

12 

Internal medicine and 
family practice residents 

Dermatology residents 
and attending physicians 

52“ 

67“ 

Paine et al, 41 
1994 

Photographs 

3 

General practitioners 
(Australia) 

b 

83 


McGee et al, 42 
1994 

Photographs 

3 

General practitioners 
(New Zealand) 

Physicians registered as 
dermatologists 

96 

97 


“Biopsy requirement and type of biopsy considered in correct treatment options. 
“Ellipses indicate data not available. 
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In summary, malignant melanoma is an increasingly com¬ 
mon malignancy, with an incidence rate that is projected to 
increase. The medical history and physical examination play 
a unique role in the secondary prevention of cutaneous 
malignant melanoma. It is the sole means of identifying 
lesions that require excision for histopathologic evaluation. 
Because of the growth characteristics of melanoma, examina¬ 
tions that detect earlier stages of melanoma can result in a 
better prognosis. The utility of the ABCD(E) and revised 7- 
point checklists for distinguishing melanoma from benign 
skin lesions is not conclusively described. The ABCD(E) 
checklist (when a positive test result does not require all 4 
features to be present) and the revised 7-point checklist 
appear to be sensitive diagnostic aids in evaluating individual 
lesions and therefore rarely classify a melanoma as a benign 
lesion. However, the revised 7-point checklist lacks specific¬ 
ity, resulting in benign lesions being classified as potentially 
malignant. The specificity of the ABCD(E) checklist is less 
well described. A change in lesion characteristics is fre¬ 
quently reported by patients with melanoma and is an 
important feature to assess during an examination. Better 
study designs are necessary to define the operating character¬ 
istics of physicians’ examinations for detecting the presence 
or absence of melanoma. Existing evidence suggests that 
examinations are highly specific, at least among dermatolo¬ 
gist examiners, but sensitivity estimates are less clear. Data 
regarding nondermatologists’ examinations suggest that 
their examinations are less sensitive than those performed by 
dermatologists. 
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Melanoma 


UPDATE: 



Prepared by David L. Simel, MD, MHS, and James M. Grichnik, MD, PhD 

Reviewed by John D. Whited, MD 


CLINICAL SCENARIO 


A 40-year-old non-Hispanic white patient presents to 
your clinic with concern about some skin lesions. He has 
no personal history of dysplastic nevi and no one in his 
family has melanoma. He confesses that he is not partic¬ 
ularly worried about them, but his girlfriend is worried. 
There are 2 lesions on the upper back, neither of which 
the patient can see directly. He can feel one and observes 
that perhaps it has changed in size. As a child, he typi¬ 
cally went without a shirt during much of the summer 
and did not use sunscreen. Sometimes, he sunburned 
with prolonged exposure. He has not experienced sun¬ 
burns since his teenage years. The lesions are shown in 
Figures 29-1 and 29-2. 

UPDATED SUMMARY ON MELANOMA 

Original Review 

Whited JD, Grichnik JM. Does this patient have a mole or a 
melanoma? JAMA. 1998;79(9):696-701. 

UPDATED LITERATURE SEARCH 

Details of the Update 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the sub¬ 
ject “melanoma/di,” published in English from 1997 to 2004. 
We also crossed the clinical subject headings with “meta¬ 
analysis,” “ROC curve,” and the textwords “ABCDE,” “7- 
point,” and “seven-point” in the MEDLINE database. The 
results yielded 179 titles, for which we reviewed the titles and 
abstracts; 48 were selected for additional review. These arti¬ 
cles were reviewed to identify articles that assessed the sensi¬ 
tivity and specificity of the medical history or physical 
examination features of nevi for melanoma. We required that 
the studies be done on actual patients (as opposed to pic¬ 
tures), involve prospectively collected data, and use basic 
observational skills used by general practitioners as opposed 
to examinations requiring special equipment. Because our 



Figure 29-1 Lesion 1 



Figure 29-2 Lesion 2 


focus was on the actual features of the examination itself 
rather than the overall accuracy of the examination, we elim¬ 
inated studies without data on individual findings or the use 
of standardized checklists. We retained only 1 article on the 
ABCDE criteria. We found no additional studies on the 7- 
point checklists mentioned in the original publication. 









CHAPTER 29 Update 


NEW FINDINGS 

• The “E” of the ABCDE checklist now represents “enlarge¬ 
ment” as reported by the patient, rather than “elevation” as 
determined by the clinician (A = asymmetry in 2 axes, B = 
border irregularity, C = more than 1 color, D = dimension 
> 6 mm). 

• The patient reports that a lesion has enlarged is the single 
most powerful finding of the ABCDE criteria. 

• Any single positive finding of the ABCDE criteria may 
justify a biopsy or referral to rule out a melanoma. The 
greater the number of findings, the greater the likeli¬ 
hood of melanoma. Dermatologists can use other tech¬ 
niques to help distinguish melanomas from atypical 
nevi. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A high quality study of the revised ABCDE criteria supports 
this screening paradigm. 

CHANGES IN THE REFERENCE STANDARD 

There are 3 issues involved in determining a reference stan¬ 
dard. One acceptable reference standard is the result of histo- 
pathology. This standard can apply to all examiners. The 
reference standard can be stratified by whether the patient has 
a melanoma or combined to create a composite of melanoma 


Table 29-6 Univariate Findings for Melanoma From the 

ABCDE Criteria 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

A (asymmetry) 

2.1 (1.9-2.5) 

0.59 (0.52-0.66) 

B (border) 

2.1 (1.8-2.4) 

0.59 (0.53-0.67) 

C (color) 

1.6 (1.5-1.8) 

0.59 (0.52-0.68) 

D (dimension) 

2.3 (2.1-2.5) 

0.17(0.13-0.22) 

E (enlargement) 

11 (8.5-14) 

0.18(0.15-0.22) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 29-7 Multivariate Findings for Melanoma 

No. of Positive ABCDE Findings 

LR (95% Cl) 

5 Positive 

98(31-303) 

>4 

8.3(6.2-11) 

>3 

3.3 (2.8-3.9) 

>2 

2.6 (2.3-2.9) 

>1 

1.5 (1.4-1.6) 

0 

0.07 (0.04-0.13) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


or dysplasia. A second reference standard is not precisely one 
of diagnostic accuracy, but instead assesses whether a primary 
care provider made the right “diagnosis” of a patient’s lesion, 
as evidenced by the decision to refer or biopsy. These studies 
use expert dermatologists who evaluate the patient or photo¬ 
graphs of the lesion against a set of criteria for appropriate¬ 
ness. Finally, the inability to biopsy all lesions on a patient 
means that in a research study, the only patients enrolled are 
those with a suspicious lesion. This creates verification bias 
that could be avoided by following patients who do not 
undergo a biopsy. With a reasonable follow-up period, an 
unchanging lesion evaluated through direct observation and 
serial photographs could be accepted as proof of a nonmalig- 
nant melanocytic lesion. 

RESULTS OF LITERATURE REVIEW 

A large study, 1 conducted by dermatologists, represents the 
largest prospective evaluation of the revised ABCDE criteria 
that also uses a group of patients without melanoma (see 
)les 29-6 and 29-7). Although the “E” initially represented 
elevation of the lesion, in this study it represented the 
patient’s report that the lesion had enlarged. No study has 
evaluated these criteria in a large population of patients 
treated initially by primary care providers. Dermatologists 
typically have greater sensitivity for accurately diagnosing 
melanoma than primary care providers, but the specificity 
of primary care providers has not been studied well. 2 We 
infer that dermatologists’ sensitivity for the individual 
ABCDE criteria would be higher than the sensitivity for pri¬ 
mary care physicians. 

A variety of studies on dermoscopy, including a systematic 
review, were identified. 3 Dermoscopy, variably called derma- 
toscopy or epiluminescence microscopy, involves viewing a 
lesion through a handheld microscope that is similar to an 
otoscope. Dermoscopes provide xlO or higher magnifica¬ 
tion of the lesion through immersion oil (or cross-polarized 
light) to reduce surface reflection and allow the visualization 
of colors and patterns not easily seen with the naked eye. In 
general, trained dermatologists use this procedure as an 
examination secondary to basic clinical observations. The 
intent of dermoscopy is to provide additional information 
and improve diagnostic skill. Training is required, and the 
utility of dermoscopy for primary care providers remains 
under study and is not deemed sufficiently developed at this 
point for this review. 

EVIDENCE FROM GUIDELINES 

The US Preventive Services Task Force found the benefits of 
screening for melanoma unproven. 4 However, the recom¬ 
mendation addressed screening with a total body skin exami¬ 
nation. The task force recommends that clinicians be aware 
of the ABCD criteria or rapidly changing lesions that become 
apparent for whatever reason and that lesions with 1 or more 
abnormalities be biopsied. The evidence report behind those 
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Table 29-8 Skin Type Risk Factors for Melanoma 6 


Skin Type 

Do You Burn in 
the Sun? 

Do You Tan After Having 
Been in the Sun? 


1 

Always 

Seldom 


II 

Usually 

Sometimes 


III 

Sometimes 

Usually 


IV 

Seldom 

Always 


V 

Naturally brown skin 


VI 

Naturally black skin 



recommendations did not address the accuracy of the indi¬ 
vidual criteria. 4 Using the data of the US Preventive Services 
Task Force, the Canadian Task Force on Preventive Health 
concluded that the evidence is conflicting for the total body 
skin examination in the general population. 5 The World 
Health Organization makes no specific recommendations on 
screening. They do suggest stratifying patients by risk factor. 
Patients with skin types I to II are at the highest risk of mela¬ 
noma, types III to IV confer intermediate risk, and types V to 
VI have the lowest risk (see ible 29-8). 6 


CLINICAL SCENARIO—RESOLUTION 


LESION 1 

This lesion is symmetric (A), has a discrete border (B), is 
of one color (C), is small (D), and has not enlarged (E) 
according to the patient. Thus, it has none of the ABCDE 
characteristics and the likelihood ratio (LR) for mela¬ 
noma is low (0.07). 

LESION 2 

This lesion is asymmetric (A), has an irregular border (B), 
is of at least 2 colors (C), is larger than 6 mm (D) 
(although this might not be apparent from the photomi¬ 
crograph), and is the one the patient believes might have 
changed size (E). Only 1 of the ABCDE criteria would 
have justified an appropriate biopsy, but this lesion fulfills 
all criteria and has a LR of 98 for melanoma. 
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MELANOMA— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The lifetime risk of melanoma is 1.7% for US white men and 
1.3% for white women, 7 which means that the probability of 
melanoma at any given point is less than 1%. Thus, for the 
US patient at average risk, the prior probability of mela¬ 
noma can be conservatively taken as 1%, although the true 
point prevalence is likely much lower. Other populations 
have different risks. For example, the incidence of mela¬ 
noma in the United States was 13.3 per 100000 in 1995 vs 55 
per 100000 in Queensland, Australia. 4 

POPULATION FOR WHOM MELANOMA SCREENING 
SHOULD BE CONSIDERED 

• Atypical (dysplastic) nevi 

• Family history of melanoma 

• Personal history of melanoma 

• Multiple nevi 

• Fair-skinned patients, especially those with sun sensitivity 
or proclivity to sunburns 

• Large congenital nevus 

• Immunosuppression 


DETECTING THE LIKELIHOOD OF MELANOMA 

See Table 29-9. 


Table 29-9 Likelihood Ratios for Any One Finding From the ABCD(E) 
Criteria and the Individual Findings 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Any one finding from 
the ABCDE criteria" 

1.5 (1,4-1.6) 

0.07(0.04-0.13) 

A (asymmetry) 

2.1 (1.9-2.5) 

0.59 (0.52-0.66) 

B (border) 

2.1 (1.8-2.4) 

0.59 (0.53-0.67) 

C (color) 

1.6 (1.5-1.8) 

0.59 (0.52-0.68) 

D (dimension) 

2.3 (2.1-2.5) 

0.17(0.13-0.22) 

E (enlargement) 

11 (8.5-14) 

0.18(0.15-0.22) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“The LR+ increases rapidly with the number of abnormalities. Zero positive findings 
has a very low LR. 

REFERENCE STANDARD TESTS 

Biopsy with histopathology. 
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TITLE Semiological Value of the ABCDE Criteria in the 
Diagnosis of Cutaneous Pigmented Tumors. 

AUTHORS Thomas L, Tranchand P, Berard F, Secchi T, 
Colin C, Moulin G. 

CITATION Dermatology. 1998;197( 1):11-17. 

QUESTION What is the effect of adding the (E) criteria 
(enlargement) to the traditional ABCD criteria for mela¬ 
noma? 

DESIGN All data were collected prospectively on consec¬ 
utive patients undergoing a biopsy for a pigmented lesion. 

SETTING Dermatology department. All patients were 
examined by dermatologists. 

PATIENTS From the database, all patients with mela¬ 
noma and prospectively recorded data (n = 460) were ana¬ 
lyzed, along with 680 patients with the same prospectively 
recorded data who were found to have nonmalignant mel- 
anocytic tumors. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A standardized form with 86 items was recorded for each 
patient. All patients had histopathology done after the exami¬ 
nation was recorded. 

A. Geometrical asymmetry in 2 axes of the pigmented tumor 

B. Irregular borders 

C. At least 2 colors with the exception of darkening in the cen¬ 
tral lesion 

D. Diameter greater than or equal to 6 mm 

E. Enlargement of the surface (not height) as reported by the 
patient 

MAIN OUTCOME MEASURES 

Each item in the ABCDE criteria was assessed independently. 
In addition, a score consisting of the sum of positive results 
was compared with the reference standard. 


MAIN RESULTS 

See Tables 2‘ 10, 29-11, and 2‘ 


Table 29-10 Likelihood Ratio of the Components of the ABCDE Scale 
for Melanoma 

Test 

LR+ (95% Cl) 

LR- (95% Cl) 

DOR (95% Cl) 

A (asymmetry) 

2.1 (1.9-2.5) 

0.59 (0.52-0.66) 

3.7 (2.9-4.7) 

B (border) 

2.1 (1.8-2.4) 

0.59 (0.53-0.67) 

3.5 (2.7-4.4) 

C (color) 

1.6 (1.5-1.8) 

0.59 (0.52-0.68) 

2.8 (2.2-3.5) 

D (dimension) 

2.3 (2.1-2.5) 

0.17(0.13-0.22) 

14(9.7-19) 

E (enlargement) 

11 (8.5-14) 

0.18(0.15-0.22) 

60(41-88) 

Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 


Table 29-11 Serial Likelihood Ratios for the Number of Positive 
Findings From the ABCDE Scale for Melanoma 

No. of Positive Findings From ABCDE Criteria 

LR (95% Cl) a 

5 Positive 

98 (31-303) 

>4 

8.3(6.2-11) 

>3 

3.3 (2.8-3.9) 

>2 

2.6 (2.3-2.9) 

>1 

1.5 (1.4-1.6) 

0 

0.07(0.04-0.13) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

The area under the curve for these data is 0.85 (SE, 0.01). 


Table 29-12 Likelihood Ratio of the Components of the ABCDE Scale 
for Melanoma or Atypical Dysplastic Nevus 

Test 

LR+ (95% Cl) 

LR- (95%) Cl 

DOR (95% Cl) 

A (asymmetry) 

2.7 (2.3-3.2) 

0.52 (0.47-0.59) 

5.1 (4.0-6.6) 

B (border) 

2.6 (2.2-3.0) 

0.52 (0.47-0.59) 

4.9 (3.8-6.4) 

C (color) 

2.0(1.7-2.2) 

0.49 (0.43-0.57) 

4.0 (3.1-5.1) 

D (dimension) 

2.5 (2.2-2.8) 

0.17(0.13-0.22) 

15(10-20) 

E (enlargement) 

13(9.2-17) 

0.25 (0.21-0.29) 

51 (35-76) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 
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CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large data set with all variables entered pro¬ 
spectively before biopsy results. Clinicians who agreed on 
definitions of criteria tested themselves with pictures before 
study (data on interobserver variability not provided). Data 
are provided for both melanomas and atypical nevi. 

LIMITATIONS All examiners were dermatologists. Data are 
not provided on the numbers of patients who had pigmented 
lesions who did not undergo biopsy. 

The researchers in this study modified the previously sug¬ 
gested “E” criteria from “elevated” to “enlarged.” By defini¬ 
tion, enlargement is self-reported by the patient and is the 
only historical item among the criteria. As single variables, 
the most important are size greater than or equal to 6 mm 
(D) and enlargement (E). 

The analysis of “number” of positive findings is informa¬ 
tive. Clinicians can decide the level that they are willing to 
accept to justify a biopsy. A strategy to biopsy all patients 
with even 1 positive finding will lead to a diagnosis in 97% of 
patients, but there will be numerous biopsies for nonmalig- 
nant melanocytic tumors—in many clinical settings, this is 
not an acceptable strategy. The data are also interesting in 
that they allow inferences about the independence of find¬ 


ings. Multiplying the positive LRs for each finding gives an 
LR of 179 compared to the actual value of 98 for 5 positive 
findings. These LRs produce similar effects on the posterior 
probability and suggest that the presence of findings confers 
independent information. On the other hand, serially multi¬ 
plying the negative LRs gives an LR of 0.006; that value on a 
logarithmic scale is much lower than the actual LR of 0.07 
when all 5 findings are absent. Nonetheless, an LR of 0.07 is 
low and will rule out melanoma for many patients. 

Many primary care clinicians would use a more pragmatic 
reference standard than melanoma alone because atypical 
(dysplastic) nevi are markers for melanoma risk. When the 
data are compared with a reference standard that considers 
either melanoma or atypical nevi as “positive,” the operating 
characteristics of almost every finding improve (the positive 
LR increases and the negative LR decreases). Given what 
appears to be independence for the presence of the ABODE 
criteria, this further justifies using the presence of only 1 cri¬ 
terion as an indication for biopsy or referral to a dermatolo¬ 
gist. Because the criteria are not efficient at distinguishing 
melanoma from atypical nevi, this may lead to an increase in 
the removal of nonmalignant atypical nevi. 

Reviewed by David L. Simel, MD, MHS, and James M. 
Grichnik, MD, PhD 
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CLINICAL SCENARIOS 


Does This Adult 
Patient Have 

Acute Meningitis? 

John Attia, MD, PhD 
Rose Hatala, MD, MSc 
Deborah J. Cook, MD, MSc 
Jeffrey G. Wong, MD 


CASE 1 A 30-year-old man presents to the emergency 
department with a 24-hour history of chills and a stiff 
neck. On clinical examination, he is afebrile and has nor¬ 
mal mental status. He can fully flex his neck, although he 
complains of pain over his cervical spine when doing so. 
Kernig and Brudzinski signs are absent. 

CASE 2 A previously healthy 70-year-old woman pre¬ 
sents to the emergency department with a 3-day history of 
fever, confusion, and lethargy. She is unable to cooperate 
with a full physical examination, but she has neck stiffness 
on neck flexion. The findings from a chest radiograph and 
urinalysis are normal. 


WHY IS CLINICAL EXAMINATION IMPORTANT? 


If, in a fever, the neck he turned awry on a sudden, so that the 
sick can hardly swallow, and yet no tumour appear, it is mortal. 

—Hippocrates, Aphorism XXXV 

As early as the fifth century BC, clinicians recognized the 
seriousness of infectious meningitis. 1 In the 20th century, 
the annual incidence of bacterial meningitis ranges from 
approximately 3 per 100000 population in the United 
States 2 to 45.8 per 100000 in Brazil 3 to 500 per 100000 in 
the “meningitis belt” of Africa. 4 In one county in Minne¬ 
sota, there was an incidence rate of viral meningitis of 10.9 
per 100000 person-years from 1950 to 1981, with most 
cases occurring in the summer months. 5 

Despite the availability of antimicrobial therapy, meningitis- 
related case fatality rates remain high, with a 17% all-cause 
mortality rate between 1980 and 1988 reported for commu¬ 
nity-acquired and nosocomial bacterial meningitis among 
patients aged 16 years and older. 6 Among previously healthy 
patients who survive pneumococcal meningitis, up to 18% 
may experience long-term sequelae, including dizziness, 
excessive fatigue, and gait ataxia. 7 Clinical signs and symp¬ 
toms at presentation may predict prognosis. 8 Thus, early 
clinical recognition of meningitis is imperative to allow clini¬ 
cians to efficiently complete further investigations and ini¬ 
tiate appropriate therapy, with a goal of minimizing these 
adverse outcomes. 

The purpose of this systematic review is to provide clini¬ 
cians with an understanding of the literature from which the 
current clinical approach to meningitis is derived. Optimal 
use of the clinical examination aids physicians in identifying 
patients at sufficient risk for meningitis to require further 
definitive diagnostic testing with a lumbar puncture (LP). 
Patients in whom meningitis is suspected require this inva¬ 
sive procedure to effectively establish or refute the diagnosis. 
In addition, evaluation of the cerebrospinal fluid (CSF) may 
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help direct antimicrobial therapy. 9 To avoid unnecessary 
invasive procedures, identifying clinical features that could 
distinguish patients at high and low risk of meningitis would 
be useful. Clinical findings with a high specificity will assist 
clinicians in the decision to proceed to LP. Conversely, clini¬ 
cal findings with a high sensitivity will aid clinicians in decid¬ 
ing against invasive investigation, particularly for patients for 
whom the clinical suspicion of meningitis is relatively low. 

This systematic review will focus on the features of history 
taking and physical examination that clinicians use to iden¬ 
tify adult, immunocompetent patients at risk for acute men¬ 
ingitis for whom further diagnostic testing is indicated. We 
use the term meningitis to refer to acute infections of the 
meninges of either bacterial or viral origin. 

Pathophysiology of Meningitis 

The brain is protected from infection by the skull; the pia, 
arachnoid, and dural meninges covering its surface; and the 
blood-brain barrier. When any of these defenses are breached 
by a pathogen, infection of the meninges and subarachnoid 
space results in meningitis. Predisposing factors for the 
development of community-acquired meningitis include 
preexisting diabetes mellitus, otitis media, pneumonia, 
sinusitis, and alcohol abuse. 6 

The clinical features of meningitis are a reflection of the 
underlying pathophysiologic processes (Table 30-1). Sys¬ 
temic infection generates nonspecific findings such as fever, 
myalgia, and rash. Once the blood-brain barrier is breached, 
an inflammatory response within the CSF occurs. The resul¬ 
tant meningeal inflammation and irritation elicit a protective 
reflex to prevent stretching of the inflamed and hypersensi¬ 
tive nerve roots, which is detectable clinically as neck stiffness 
or Kernig or Brudzinski signs. 10,11 The meningeal inflamma¬ 
tion may also cause headache and cranial nerve palsies. Ele¬ 
vated intracranial pressure, altered mental status, vomiting, 
and seizures may ensue. 

Examination for the Signs and Symptoms of Meningitis 

The classic clinical presentation of acute meningitis is the 
triad of fever, neck stiffness, and an altered mental state. 
However, less than two-thirds of patients present with all 3 
clinical findings. 6 While taking the patient’s history, clinicians 
suspecting meningitis will examine for general symptoms of 
infection (such as fever, chills, and myalgias), as well as 


Table 30-1 Pathophysiology of Clinical Findings in Meningitis 

Pathophysiology 

Clinical Features 

Systemic infection 

Fever, myalgia, rash 

Meningeal inflammation 

Neck stiffness, Kernig sign, Brudzinski sign, jolt 
accentuation of headache, cranial nerve palsies 

Cerebral vasculitis secondary to 
meningeal inflammation 

Focal neurologic abnormalities, seizures 

Elevated intracranial pressure 
secondary to meningeal inflam¬ 
mation and cerebral edema 

Change in mental status, headache, cranial 
nerve palsies, seizures 


symptoms suggesting central nervous system infection (pho¬ 
tophobia, headache, nausea and vomiting, focal neurologic 
symptoms, or changes in mental status). 

The physical examination must include checking the vital 
signs and a brief mental status examination. General inspec¬ 
tion may reveal a rash. In patients with severe meningeal irri¬ 
tation, the patient may spontaneously assume the tripod 
position (also called Amoss sign or Hoyne sign), sitting on 
the edge of the bed with the knees and hips flexed, the back 
arched lordotically, the neck extended, and the arms brought 
back to support the thorax. 12 

Physical examination specifically for meningitis includes 
assessing neck stiffness, testing for Kernig and Brudzinski 
signs, and assessing jolt accentuation of headache. Neck stiff¬ 
ness is assessed by examining the neck for rigidity by gentle 
forward flexion, with the patient in the supine position. 

Like neck stiffness, Kernig and Brudzinski signs also indi¬ 
cate meningeal irritation. Vladimir Kernig, a Russian physi¬ 
cian, first published the description of the sign that bears his 
name in 1884, 10,13 although the sign had been previously 
described by Lazarevic in 1880 and by Forst in 1881. 12 In 
Kernig’s original description, when patients sat on the edge 
of a bed with their legs dangling, an attempt to extend the 
knee joint more than 135 degrees, or in severe cases more 
than 90 degrees, elicited spasm of the extremity that disap¬ 
peared when the patients lay supine or stood up. Today, the 
maneuver is most commonly performed with the patient 
lying supine and the hip flexed at 90 degrees. A positive sign 
is present when extension of the knee from this position elic¬ 
its resistance or pain in the lower back or posterior thigh. 

In 1909, Josef Brudzinski, a Polish physician, described 
many meningeal signs in children. 10,14 His best-known “nape 
of the neck” sign (Brudzinski sign) is present when passive 
neck flexion in a supine patient results in flexion of the knees 
and hips. A separate sign, the contralateral reflex, is present if 
passive flexion of the hip and knee causes flexion of the con¬ 
tralateral leg. 

An additional maneuver in assessing for meningitis is to 
elicit jolt accentuation of the patient’s headache by asking the 
patient to turn his or her head horizontally at a frequency of 
2 to 3 rotations per second. Worsening of a baseline headache 
represents a positive sign. 15 

A complete neurologic examination follows these more 
specific tests for meningitis, including examination of the 
cranial nerves, the motor and sensory systems, and reflexes 
and testing for Babinski reflex. A general examination fol¬ 
lows, with an emphasis on the ears, sinuses, and respiratory 
system. 

METHODS 

Literature Search and Selection 

We searched MEDLINE for articles published from 1966 to 
July 1997, using a structured search strategy (available from 
the authors on request) to retrieve English- and French- 
language articles describing the precision and accuracy of the 
clinical examination in the diagnosis of meningitis. This 
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search strategy yielded 139 abstracts, which were reviewed by 
one of us (J.A.) for relevance. Full-text articles were retrieved 
for abstracts that potentially met the inclusion criteria. Addi¬ 
tional references were identified by searching the reference 
lists of pertinent articles. 

Explicit inclusion and exclusion criteria were applied to 
the retrieved articles. We included articles that were original 
studies describing the accuracy or precision of the clinical 
examination in the diagnosis of meningitis in which most 
patients had objectively confirmed bacterial or viral menin¬ 
gitis. We excluded studies that enrolled only children or 
immunocompromised adults, described mixed patient pop¬ 
ulations from which adult data could not be extracted, or 
focused only on metastatic meningitis or meningitis of a sin¬ 


gle specific microbial origin (ie, Listeria meningitidis or 
Mycobacterium tuberculosis). Tuberculous meningitis was 
also excluded on the grounds that this infection is more prev¬ 
alent in patients with human immunodeficiency virus 
infection 16 and in children, neither of which represents our 
target population. However, we retained in our analyses 2 
studies 15,17 in which there were insufficient data to separate 
the patients with tuberculous meningitis (Table 30-2). 

Study Characteristics 

This systematic review differs from previous Rational Clini¬ 
cal Examination articles in that all but 1 article 17 of the 9 
articles 6,17 ' 24 that met our inclusion criteria were retrospective 


Table 30-2 Studies Assessing Clinical Presentation of Patients 




Source, y 

Clinical Setting, Years 

No. of 
Patients 

Age, y, Mean 
(Range) 

Type of 
Meningitis 3 

Patient Identification 

Clinical 

Findings 

Defined 

Sigurdardottir et al, 18 
1997 

All hospitals in Iceland, 
1975-1994 

119 

44% >45 

Bacterial 

All patients with bacterial isolates from 
cerebrospinal fluid or meningococce- 
mia, processed at national central labo¬ 
ratory, complete hospital records for 

119 of 132 patient episodes 

No 

Durand et al, 6 1993 

University hospital, 1962- 
1988 

259 

56% >50 (16-88) b 

Bacterial 

Hospital diagnosis of acute bacterial 
meningitis, including transferred 
patients 

No 

Uchihara andTsuka- 
goshi, 15 199T 

General hospital, dates 
not specified 

34 

38.6(15-71) 

Aseptic (n = 28), 
bacterial/tubercu¬ 
lous (n = 1), other" 

Patients presenting to outpatient or 
emergency department with head¬ 
ache and fever 

Yes 

Genton and Berger, 19 
1988 

University hospital, 1977- 
1982 

112 

Women, 41; men, 
40 (16-89) 

Bacterial 

Patients admitted and discharged 
with a diagnosis of meningitis 

No 

Gorse et al, 20 1984 e 

University and Veterans 
Affairs hospitals, 1970- 
1982 

54 

64 (50-95) 

Bacterial 

Patients with a discharge diagnosis of 
meningitis 

No 

Gorse et al, 20 1984 e 

University hospital, 1970- 
1982 

32 

(15-49) f 

Bacterial 

Patients with a discharge diagnosis of 
meningitis 

No 

Massanari, 21 1977 

University hospital, 1965- 
1975 

17 

>65 8 

Bacterial 

Patients with a chart diagnosis of 
meningitis 

No 

Magnussen, 22 1980 

Community hospital, 
1969-1978 

59 

39 h 

Aseptic (n = 34), 
bacterial 

Patients with a discharge diagnosis of 
acute meningitis 

No 

Domingo et al, 23 1990 

Hospital, 1974-1988 

59 

71 (65-87) 

Bacterial 

Not indicated 

No 

Behrmanetal, 24 1989 

University hospital, 1970- 
1985 

31 

72 (65-89) 

Aseptic (n = 4), 
bacterial 

Patients with a discharge diagnosis of 
meningitis, subdural empyema, brain 
abscess, or epidural abscess 

Yes 

Rasmussen et al, 17 
1992 

Community hospitals, 
1976-1988 

48 

69' (60-88)' 

Tuberculous 
(n = 6), bacterial 

Computer search of hospital records 
for patients with a diagnosis of acute 
bacterial meningitis 

No 


“Infections included in calculations of sensitivities for clinical findings. 
b Community-acquired meningitis. 

'Prospective study design, assessing clinical findings compared with cerebrospinal fluid pleocytosis in patients presenting with headache and fever. 

'Predominantly aseptic meningitis (28/54 patients). Other includes subarachnoid hemorrhage (n = 2), acute monocytic leukemia (n = 1), Sjogren syndrome (n = 1), upper respi¬ 
ratory tract infection (n = 11), infectious diarrhea (n = 3), edentulous (n = 2), glaucoma (n = 1), and not specified (n = 3). 

'Two patient groups were included in this study: 54 patients older than age 50 years and 32 patients aged between 15 and 49 years. Each age group is reported separately. 
'Mean age not reported. 

'Mean age and range not reported. 

"Mean age calculated from data in study; range not reported. 

'Median age and range. 

















CHAPTER 30 The Rational Clinical Examination 


chart reviews. These studies assessed the clinical presentation 
of a total of 845 patient episodes (824 patients), in patients 
aged 16 to 95 years, with meningitis confirmed by LP or 
autopsy (Table 30-2). 

Because no quality grading system for chart reviews has 
been widely established, we assessed the validity of these 
studies by critically appraising several components of the 
study design (Table 30-2). These components included an 
assessment of the reference standard used to diagnose 
meningitis (LP or autopsy), the completeness of patient 
ascertainment, and whether the clinical examination was 
described in sufficient detail to be reproducible. The major 
limitation common to all these studies was the lack of a 
control population, which means that only sensitivities 
were available for most of the clinical findings. In addition, 
the reported sensitivities may overestimate the true sensi¬ 
tivities (as could be established in a prospective study) 
because the clinical examinations recorded in the charts 
could have been performed with knowledge of the LP 
results. 

The single prospective study included 54 inpatients and 
outpatients presenting with fever and headache to a Japa¬ 
nese center (Table 30-2). 15 A standardized clinical examina¬ 
tion was performed by an examiner before LP was undertaken, 
and clinical findings were compared with those of CSF 
pleocytosis. 

Data Analysis 

Clinical examination findings that differ between viral and 
bacterial causes are explicitly indicated. Sensitivities for the 
various signs and symptoms of meningitis were calculated 
from the data in each study. Pooled sensitivities were calcu¬ 
lated for each feature of the clinical examination, using a 
random-effects model. 25 

Because control groups of patients without meningitis 
were not included in the 9 retrospective studies, specificities 
for many features of the clinical examination were unavail¬ 


able. For the findings assessed in the prospective study, speci¬ 
ficities and likelihood ratios (LRs) were calculated and 
included. 15 

RESULTS 

Precision of Symptoms and Signs of Meningitis 

Data on the precision of the clinical examination for menin¬ 
gitis were not available from the retrospective studies. In the 
prospective study, a single clinician completed all clinical 
examinations. 15 

Accuracy of the Clinical History 
in the Diagnosis of Meningitis 

The individual components of the clinical history have low 
sensitivity for the diagnosis of meningitis, as indicated in 
Table 30-3. In addition to symptoms of headache and nausea 
and vomiting, neck pain was reported to have a sensitivity of 
28% among patients with meningitis. 20 Data from the pro¬ 
spective trial suggest that the clinical history also lacks speci¬ 
ficity for the diagnosis of meningitis, with reported specificities 
of 15% for a nonpulsatile headache, 50% for a generalized 
headache, and 60% for nausea and vomiting. 15 Thus, clinical 
history alone is not useful in establishing a diagnosis of men¬ 
ingitis. The inaccuracy of the clinical history may relate to 
the frequently impaired mental status of patients with men¬ 
ingitis (pooled sensitivity, 67%; 95% confidence interval 
[Cl], 52%-82%) (Table 30-4), who are relatively incapable of 
providing an accurate clinical history. 21,22 

Accuracy of the Physical Examination 
in the Diagnosis of Meningitis 

In contrast to the clinical history, elements of the physical 
examination have sensitivities that are clinically useful. The 
frequency with which patients presented with the classic clin¬ 
ical triad of fever, neck stiffness, and a change in mental sta- 


Table 30-3 Sensitivity of Clinical History in the Diagnosis of Meningitis 


Source, y 

No. of Patient Episodes 

Headache, % 

Nausea and Vomiting, % 

Neck Pain, % 

Uchihara and Tsukagoshi, 15 1991 a 

34 

27 

32 

NA 

Gorse et al, 20 1984 b 

54 

43 

30 

28 

Massanari, 21 1977 

17 

41 

NA 

NA 

Magnussen, 22 1980 

59 

78 

NA 

NA 

Domingo et al, 23 1990 

59 

81 

NA 

NA 

Behrmanetal, 24 1989 

C\J 

CO 

31 

NA 

NA 

Rasmussen et al, 17 1992 

48 

46 

29 

NA 

Pooled sensitivity (95% confidence interval) 


50 (32-68) [n = 303]“ 

30 (22-38) [n = 136]“ 

NA 

Abbreviation: NA, not assessed. 


“Only study patients with pleocytosis were included in the calculation of sensitivity. 
“Data reported only for patients older than 50 years. 

“Thirty-one patients with 32 patient episodes. 

“Number in brackets is patients included in calculation of sensitivity. 
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Table 30-4 Sensitivity of the Physical Examination in the Diagnosis of Meningitis 3 


Fever, Neck 


Source, y 

No. of 
Patient 
Episodes 

Fever 

Neck 

Stiffness 

Altered 

Mental 

Status 

Stiffness, and 
Altered 
Mental Status 

Focal 

Neurologic 

Findings 6 

Rash 

Kernig 

Sign 

Jolt Accentuation 
of Headache 

Sigurdardottir et 
al, 18 1997 

119 

97 

82 

66 

51 

10 

52 

NA 

NA 

Durand et al, 6 1993 

279“ 

95 

88 

78 

66 

29 

11 

NA 

NA 

Uchihara and Tsu¬ 
kagoshi, 15 1991“ 

34 

71 

15 

NA 

NA 

NA 

NA 

9“ 

97* 

Genton and 

Berger, 19 1988 

112 

NA 

NA 

32 

NA 

10 

NA 

NA 

NA 

Gorseetal, 20 1984 s 

54 

91 

81 

89 

NA 

39 

NA 

NA 

NA 

Gorseetal, 20 1984 s 

32 

75 

66 

53 

NA 

22 

NA 

NA 

NA 

Massanari, 21 1977 

17 

88 

76 

88 

NA 

NA 

NA 

NA 

NA 

Magnussen, 22 1980 

59 

42 

81 

20 h 

NA 

10 

NA 

NA 

NA 

Domingo et al, 23 

1990 

59 

95 

92' 

88 

NA 

37 

NA 

NA 

NA 

Behrman et al, 24 
1989 

321 

94 

59 

88 

18 k 

38 

NA 

NA 

NA 

Rasmussen et al, 17 
1992 

48 

79 

54 

69 

NA 

21 

4 

NA 

NA 

Pooled sensitivity 
(95% confidence 


85(78-91) 

(n = 733) 

70 (58-82) 

(n = 733) 

67 (52-82) 

(n = 811) 

46 (22-69) 

(n = 426) 

23 (15-31) 

(n = 794) 

22 (1-43) 

(n = 446) 




interval) 

Abbreviation: NA, not assessed. 

“All data are presented as percentage unless otherwise noted. 

“Focal neurologic findings include bilateral Babinski reflexes, pupillary abnormalities, hemiparesis, cranial nerve abnormalities, nystagmus, convulsion or seizure, and tremor. 
“There were 279 patient episodes in 259 patients. 

“Only study patients with pleocytosis were included in the calculation of sensitivity. 

“Specificity of 100%; Brudzinski sign was not assessed. 

'Specificity of 60%. 

“Two patient groups were included in this study: 54 patients older than 50 years and 32 patients aged between 15 and 49 years. Sensitivities were calculated separately for each 
age group. 

“Moderate or severe alteration in mental status. 

Authors refer to this clinical finding as meningeal signs. 

Thirty-two patient episodes in 31 patients. 

“For this triad, assessed only in patients (n = 28) with bacterial meningitis. The authors of this study described the triad of symptoms as fever, neck stiffness, and headache. 


tus (or headache 24 ) was assessed in 3 studies. Although the 
pooled sensitivity for the presence of all 3 symptoms was low 
(Table 30-4), 95% of patients had 2 or more symptoms, 24 and 
2 studies reported that between 99% and 100% of patients 
had at least 1 of these clinical findings. 6 - 18 Thus, the diagnosis 
of meningitis may be effectively eliminated in adult patients 
presenting without any of the symptoms of fever, neck stiff¬ 
ness, or a change in mental status. 

As indicated in Table 30-4, documentation of fever has a 
pooled sensitivity of 85% (95% Cl, 78%-91%) for the diag¬ 
nosis of meningitis. As would be expected of a single physical 
finding common to many disorders, fever has a low specific¬ 
ity of 45%. 15 Normal body temperature may significantly 
decrease the likelihood that a patient has meningitis, 
although the presence of a fever does not definitively estab¬ 
lish the disease. The relationship between body temperature 
and meningitis may be U-shaped because hypothermic 
patients with sepsis are more likely to be severely ill than nor- 
mothermic patients. 26 


Neck stiffness is also a relatively useful clinical finding, 
with a pooled sensitivity of 70% (95% Cl, 58%-82%) (Table 
30-4). Other signs of meningeal irritation, namely, Kernig 
and Brudzinski signs, have not been well studied, although in 
Brudzinski’s original description of 42 cases of meningitis 
(including 21 cases of tuberculous meningitis), Kernig sign 
had a sensitivity of 57%, whereas Brudzinski’s nape of the 
neck sign had a sensitivity of 97% and the contralateral reflex 
sign had a sensitivity of 66%. 10 Brudzinski himself claimed to 
confirm the specificity of his nape of the neck sign by 
attempting (and failing) to elicit it in children with other 
neurologic conditions. 10 The Uchihara and Tsukagoshi 15 pro¬ 
spective study of younger adult patients (mean age, 39 years) 
reported a sensitivity of 9% and a specificity of 100% for the 
Kernig sign, whereas neck stiffness had a sensitivity of 15% 
and a specificity of 100%. Because this study enrolled 
patients presenting with fever and headache and excluded 
those with mental status abnormalities or focal neurologic 
findings, the low reported sensitivities may result from 




















CHAPTER 30 The Rational Clinical Examination 


excluding patients with the highest likelihood of having 
meningeal signs. 

Considering that these signs of meningeal irritation have 
been in use for almost a century, assessment of their accuracy 
has been limited. Indirect evidence of poor specificity comes 
from a case series of 74 acute-care and 287 geriatric patients 
(hospitalized patients in the acute-care or rehabilitation geriat¬ 
ric wards) aged 17 to 92 years. 27 Puxty et al 27 found that 13% of 
the acute-care patients and 35% of the geriatric patients had 
nuchal rigidity despite the absence of meningitis. Kernig sign 
was present in 1.5% of the acute-care and 12% of the geriatric 
populations. The low specificity of the meningeal signs may be 
caused by the frequent presence of cervical arthritis and 
spondylosis among older patients. Clearly, a well-designed 
prospective study in which patients suspected of having men¬ 
ingitis are observed prospectively is necessary to definitively 
establish the accuracy of meningeal signs. 

Alterations in mental status, ranging from confusion to 
coma, have a pooled sensitivity of 67% (95% Cl, 52%-82%) 
(Table 30-4), indicating that normal mental status may be 
helpful in ruling out meningitis in low-risk patients. One 
study directly comparing aseptic with bacterial meningitis 
reported that moderate to severe mental status abnormalities 
were more common in patients with bacterial meningitis 
than with aseptic meningitis (44% vs 3%, respectively). 22 
Similarly, a second study reported that all patients with bac¬ 
terial meningitis had a change in mental status, whereas none 
of the aseptic meningitis patients did. 24 

One of the most sensitive maneuvers in the diagnosis of 
meningitis is jolt accentuation of headache, as described by 
Uchihara and Tsukagoshi. 15 Of 34 patients with pleocytosis in 
this study, 30 had meningitis and 4 had other conditions. Jolt 
accentuation of headache was present in 33 of these patients 
compared with 8 of 20 patients without pleocytosis, yielding 
a sensitivity of 97% and a specificity of 60%. The associated 
positive likelihood ratio (LR+) was 2.4, and the negative like¬ 
lihood ratio (LR-) was 0.05. If we calculate the LRs specifi¬ 
cally for those patients with meningitis, we obtain a 
sensitivity of 100%, a specificity of 54%, an LR+ of 2.2, and 
an LR- of 0. In patients presenting with fever and headache, 
a lack of jolt accentuation of headache on physical examina¬ 
tion may essentially exclude meningitis. The main limitation 
to widespread application of these results is the small sample 
of patients assessed in this study. 

Rashes occurred most frequently in the presentation of 
meningitis due to Neisseria meningitidis, with prevalences of 
63% 6 and 80%. 17 A petechial rash occurred in 73% of 
patients with meningococcemia, whereas purpura was 
described in only 20% of these patients. 6 Petechial, purpuric, 
and ecchymotic rashes also occurred, with lower frequency, 
in infections caused by Haemophilus influenzae. Streptococ¬ 
cus pneumoniae, and L monocytogenes. Because the overall 
incidence of N meningitidis among patients with commu¬ 
nity-acquired bacterial meningitis was low (14% in 1 
series 6 ), the pooled sensitivity of a rash for the diagnosis of 
meningitis was poor (Table 30-4). 

One or more focal neurologic abnormalities were 
described in many of the case series, including bilateral 


Babinski reflexes, pupillary abnormalities, hemiparesis, cra¬ 
nial nerve abnormalities, nystagmus, convulsion or seizure, 
and tremor. As summarized in Table 30-4, the pooled sensi¬ 
tivity for these signs is low, and they are not clinically useful 
in ruling out meningitis. 


CLINICAL SCENARIOS—RESOLUTIONS 


The first scenario described a 30-year-old man with chills, 
who complained of a stiff neck but had no fever or menin¬ 
geal signs on examination. We would ask the patient 
about a headache, and, if present, assess for jolt accentua¬ 
tion. His lack of fever, normal mental status, and lack of 
jolt accentuation would be sufficient to assure us that this 
patient does not have meningitis. 

In the second scenario, a 70-year-old woman presented 
with fever, confusion, and neck stiffness. Although we do 
not know the specificity of these findings, their presence 
causes us to suspect that she may have meningitis. To 
establish or refute the diagnosis in this scenario, we would 
proceed to definitive testing by LP. 


THE BOTTOM LINE 

Assessment of the accuracy of the clinical examination in the 
diagnosis of meningitis is severely limited by the paucity of 
prospective data on this topic. Despite classic descriptions of 
meningeal signs and sweeping statements about clinical pre¬ 
sentations in generations of textbooks, the signs and symp¬ 
toms of meningitis have been inadequately studied, and the 
conclusions of this systematic review are that more prospec¬ 
tive research is required. According to the limited studies 
included in this systematic review, we suggest the following 
to make optimal use of the clinical examination. 

1. The absence of all 3 signs of the classic triad of fever, neck 
stiffness, and an altered mental status virtually eliminates 
a diagnosis of meningitis. 

2. Fever is the most sensitive of the classic triad of signs of 
meningitis and occurs in a majority of patients, with neck 
stiffness the next most sensitive sign. Alterations in mental 
status also have a relatively high sensitivity, indicating that 
normal mental status helps to exclude meningitis in low- 
risk patients. Changes in mental status are more common 
in bacterial than viral meningitis. 

3. Among the signs of meningeal irritation, Kernig and 
Brudzinski signs appear to have low sensitivity but high 
specificity. 

4. Jolt accentuation of headache may be a useful adjunctive 
maneuver for patients with fever and headache. In 
patients at sufficient risk of meningitis, a positive test 
result may aid in the decision to proceed to LP, whereas a 
negative test result essentially excludes meningitis. 
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CLINICAL SCENARIO 


A previously healthy 41-year-old woman presents to the 
emergency department with fever and a headache. She has 
had symptoms of an upper respiratory tract infection for 
the last week. During the previous 24 hours, she has devel¬ 
oped a fever and a frontal headache. In the emergency 
department, her temperature is 38.5°C (101.3°F). On 
examination, she has neck stiffness and jolt accentuation 
of her headache but no neurologic abnormalities. 

UPDATED SUMMARY ON MENINGITIS 

Original Review 

Attia J, Hatala R, Cook DJ, Wong JG. Does this adult patient 
have acute meningitis? JAMA. 1999;282(2): 175-181. 

UPDATED LITERATURE SEARCH 

We replicated the original search strategy to identify articles 
on the diagnosis of meningitis. We searched MEDLINE for 
articles from 1996 to November 2004, written in English or 
French, that described the precision and accuracy of the clin¬ 
ical examination in the diagnosis of meningitis. Search terms 
included “meningitis” combined with “physical examina¬ 
tion,” “medical history taking,” or “professional competence,” 
in addition to combining “meningitis” with “sensitivity and 
specificity” or “reproducibility of results.” Additional refer¬ 
ences were identified by searching the reference lists of perti¬ 
nent articles. 


NEW FINDINGS 

• Additional studies, both prospective and retrospective, 
confirm that no single classic item of medical history or 
physical examination is sufficiently accurate to rule in or 
rule out meningitis. 

• The absence of all items in the classic triad of fever, neck 
stiffness, and altered mental status is not sufficiently sensi¬ 
tive to rule out meningitis. 


• Additional prospective studies are necessary to establish 
the accuracy of history and physical examination, 
including jolt accentuation of headache, in patients with 
suspected meningitis. Assessment of combinations of 
clinical findings may be more helpful than any individ¬ 
ual item. However, more retrospective research will be 
of minimal value because such studies contain no speci¬ 
ficity data. 

• Patients with suspected meningitis may safely undergo 
lumbar puncture (LP) without previous CT head scan 
unless they have a decreased level of consciousness or 
focal neurologic findings, recent seizures or a history of 
central nervous system (CNS) disease, immunocompro¬ 
mised status, or age greater than 60 years. 

Details of the Update 

There continues to be a paucity of high-quality studies 
assessing the accuracy of clinical examination for the diag¬ 
nosis of meningitis. One new higher-quality prospective 
article 1 was identified from 235 potentially relevant articles 
regarding the clinical examination for meningitis. Five 
additional retrospective cohort study articles 2 ' 6 were identi¬ 
fied with similar design flaws to the previous literature on 
this topic. These 5 articles have not been individually sum¬ 
marized, but their data have been included in the updated 
summary tables. One new article 7 was also identified from 
41 potentially relevant articles regarding the safety of LP 
before computed tomography (CT) head scan in patients 
suspected of having meningitis. In updating the earlier data, 
which included 9 retrospective studies and 1 prospective 
study, we removed the one previous prospective study 8 and 
separately combined its results in a discussion with the 
newer prospective study. 


IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

In updating the review, we were able to reconfigure the data 
so that the retrospective studies (sensitivity only, >le 30-5) 
are displayed separately from the prospectively collected 
data ( able 30-6). One new prospective study 1 provides 
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Table 30-5 Sensitivity of Findings for Meningitis in Adults, 
Retrospective Studies 

Finding (No. of Combined Studies) 

Sensitivity (95% Cl) 

Medical history 

Headache (11) 

0.68 (0.55-0.79) 

Nausea and vomiting (5) 

0.52(0.34-0.71) 

Physical examination 

Fever (14) 

0.87 (0.79-0.92) 

Neck stiffness (13) 

0.80 (0.74-0.85) 

Altered mental status (15) 

0.69 (0.57-0.79) 

Classic triad (fever, neck stiffness, headache) (4) 

0.46 (0.28-0.64) 

Focal neurologic findings (12) 

0.21 (0.15-0.29) 

Rash (6) 

0.13(0.04-0.27) 

Abbreviation: Cl, confidence interval. 


Table 30-6 Likelihood Ratios for Findings for Meningitis in Adults, 
Prospective Studies' 8 

Finding 

Sensitivity (95% Cl) 

LR+ (95% Cl) 

LR- (95% Cl) 

Historical Findings 

Headache 

0.92 (0.84-0.96) 

1.1 (1.0-1.3) 

0.43(0.19-0.96) 

Nausea/vomiting 

Thomas et al 1 

0.70 (0.59-0.79) 

1.3 (1.1-1.6) 

0.64 (0.44-0.92) 

Uchihara and 
Tsukagoshi 8 

0.32(0.18-0.48) 

0.81 (0.39-1.7) 

1.1 (0.74-1.7) 

Neck stiffness 1 


1.1 (0.82-1.4) 

0.95 (0.74-1.2) 

Physical Examination 

Fever 1 

0.43 (0.32-0.53) 

0.82(0.62-1.1) 

1.2(0.94-1.5) 

Kernig sign 

Thomas et al 1 

0.05(0.02-0.13) 

0.97 (0.27-3.6) 

1.0(0.94-1.1) 

Uchihara and 
Tsukagoshi 8 

0.09(0.02-0.21) 

4.2 (0.23-77) 

0.92(0.81-1.0) 

Brudzinski sign 1 

0.05(0.02-0.13) 

0.97 (0.26-3.5) 

1.0(0.94-1.1) 

Neck stiffness 

Thomas et al 1 

0.30(0.21-0.41) 

0.94 (0.64-1.4) 

1.0(0.87-1.2) 

Uchihara and 
Tsukagoshi 8 

0.15(0.06-0.28) 

6.6(0.38-113) 

0.83 (0.74-1.0) 

Jolt accentuation 8 

0.97 (0.83-0.99) 

2.4(1.4-4.2) 

0.05(0.01-0.35) 


Abbreviations: Cl, confidence interval, LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


specificity estimates that allowed us to calculate likelihood 
ratios (LRs). Five additional retrospective studies 2 ' 6 nar¬ 
rowed the confidence intervals (CIs) around the pooled 
sensitivity estimates. These data confirmed that no single 
item of medical history or physical examination has suffi¬ 
cient sensitivity to rule out a diagnosis of meningitis. The 
new data, both retrospective and prospective, changed our 
previous conclusion regarding the classic triad of fever, 
neck stiffness, and headache to clarify that this triad is not 
sufficiently sensitive to rule out meningitis. 


CHANGES IN THE REFERENCE STANDARD 

The reference standard remains microbiologic culture. 

RESULTS OF LITERATURE REVIEW 

Retrospective Studies 

Although the previous pooled sensitivities have been updated 
to include the 5 additional retrospective studies, most of the 
updated sensitivities did not have clinically important changes. 
As in our previous review, the major limitation common to 
these studies is the lack of a control population, such that only 
sensitivities are available. 1 There is significant heterogeneity 
among the individual study results that may be, in part, caused 
by varying definitions of meningitis (viral vs bacterial, positive 
cerebrospinal fluid [CSF] culture result vs absolute CSF white 
blood cell [WBC] count). Overall, these studies confirm that 
no single finding is of adequate sensitivity that its absence rules 
out meningitis. The addition of the newer studies to the previ¬ 
ous pooled estimates has also clarified that the absence of the 
classic triad of fever, neck stiffness, and headache is not suffi- 
ciendy sensitive to rule out meningitis, a conclusion that is dif¬ 
ferent from that of our previous review. With the relatively 
narrow CIs around these pooled estimates, additional studies 
of retrospectively collected data on patients with meningitis 
are unlikely to change these conclusions. 

Prospective Studies 

Because the 2 prospective studies are the most rigorous to date, 
we believe the estimates from these studies are the most accu¬ 
rate. Uchihara and Tsukagoshi 8 enrolled 54 patients (inpatients 
and outpatients) with fever and headache who were examined 
by 1 investigator before LP. Because fever and headache were 
inclusion criteria, they are not summarized in the table. In 
addition, neck stiffness and Kernig sign had a sensitivity of 
100% (n = 20 patients), a finding that was not replicated in the 
larger study (n = 297 patients) by Thomas et al. 1 

The study by Thomas et al 1 included patients presenting to 
the emergency department with “clinically suspected menin¬ 
gitis.” Unfortunately, the physical examination technique of 
the examining physicians was not standardized, a design flaw 
common in our previous review, so not all patients under¬ 
went all aspects of the clinical examination. 

There are significant differences in the sensitivities calcu¬ 
lated for the pooled retrospective studies compared with the 
prospective data. This largely reflects the inherent difficulty 
with the retrospective design wherein the clinician’s assess¬ 
ment or recording of the patient’s clinical findings may have 
occurred after receiving the LP results. However, the essential 
conclusion for this update remains: no single classic item of 
medical history or physical examination is sufficiently accu¬ 
rate to rule in or rule out the diagnosis of meningitis. 
Whereas previously the triad of fever, neck stiffness, and 
altered mental status appeared helpful in ruling out meningi¬ 
tis in low-risk patients, the LRs from the prospective studies 
associated with the absence of fever and neck stiffness on 



















































CHAPTER 30 Meningitis, Adult 


physical examination all approach 1 and suggest that the 
triad will not be helpful in ruling out meningitis. The 
patient’s symptoms might be more important than the signs. 
The absence of headache or nausea/vomiting has summary 
LRs that decrease a patient’s pretest probability of meningitis 
but would not definitively rule it out. Although they report 
weak positive LRs for combinations of positive physical find¬ 
ings, Thomas et al 1 did not evaluate whether the combined 
absence of headache and nausea/vomiting provides more 
information than the individual findings. Jolt accentuation 
of headache (positive LR = 2.4; negative LR = 0.05), previ¬ 
ously found to be helpful in the diagnosis of meningitis, was 
not assessed in the study by Thomas et al. 1 

Future research evaluating the diagnostic value of clinical fea¬ 
tures suggesting meningitis should require prospective collec¬ 
tion of data on consecutive patients suspected of having 
meningitis, with an adequate gold standard in all patients. 
Assessment of combinations of clinical findings, rather than 
individual historical and physical examination features, is more 
likely to lead to useful results. Because meningitis is a disease 
with potentially serious clinical consequences if missed, it is 
especially important to identify clinical examination findings 
(either alone or in combination) with near-perfect sensitivity 
and very low negative LRs that clinically rule out the disease. To 
date, studies have not identified any single finding or combina¬ 
tion of clinical findings that fulfills this criterion. 

Patients suspected of having meningitis require LP. A new 
study, using the same patient population as the prospective 
clinical examination study, assessed the necessity of a CT 
head scan before LP. 7 The study demonstrated that for 
patients lacking specific baseline characteristics, it appears 
that LP can be safely performed without a CT head scan. The 
baseline characteristics associated with any abnormality on 


CT head scan included age greater than 60 years, immuno¬ 
compromised state, history of CNS disease, seizure within 1 
week of presentation, and neurologic abnormality. The neu¬ 
rologic abnormalities were a decreased level of conscious¬ 
ness, inability to answer questions and follow commands, 
gaze palsy, abnormal visual fields, facial palsy, abnormal 
motor function, and abnormal language. 

EVIDENCE FROM GUIDELINES 

No federal guidelines address the diagnostic approach to 
meningitis for immunocompetent adults. The Centers for 
Disease Control and Prevention recommends routine menin¬ 
gococcal vaccination beginning in the pre-high school years 
(http://www.cdc.gov/vaccines/vpd-vac/mening/vac-mening-fs. 
htm; accessed June 3, 2008). Patients with meningitis symp¬ 
toms should be asked whether they have been vaccinated. 
However, because the efficacy of the vaccine is less than 100% 
(although high) and does not cover all meningococcal strains, 
patients with symptoms should still be appropriately evalu¬ 
ated for meningitis. 


CLINICAL SCENARIO—RESOLUTION 


This patient has fever, headache, and neck stiffness. Her 
symptoms alone raise the possibility of meningitis, and 
her probability of meningitis is increased by the positive 
jolt accentuation of her headache. You decide to proceed 
to LP. She has none of the baseline characteristics associ¬ 
ated with an abnormal CT head scan result, so you under¬ 
take her LP directly without obtaining a CT scan. 
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MENINGITIS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Meningitis can occur sporadically or in outbreaks. It is impossi¬ 
ble to come up with a single prior probability estimate for all 
patients with symptoms compatible with meningitis. Among 
patients presenting to the emergency department at a single US 
hospital with a clinical suspicion of meningitis who underwent 
LP, the prevalence of meningitis (CSF WBC > 6/mL) was 27%. 1 
Among the patients in this study, 1 the prevalence of bacterial 
meningitis as defined by a positive CSF culture result was 1%. 
The rates of meningococcal meningitis are low (approximately 1 
case/100000 persons each year; http://www.cdc.gov/meningitis/ 
tech-clinical.htm; accessed June 3,2008). 

POPULATION FOR WHOM MENINGITIS 
SHOULD BE CONSIDERED 

Among immunocompetent patients, meningitis should be 
considered for patients presenting with combinations of 
findings that include fever, headache, altered mental status, 
neck stiffness, or photophobia. 

DETECTING THE LIKELIHOOD OF MENINGITIS 

The most common symptoms associated with meningitis 
are not particularly useful when interpreted in isolation 

(Table 30-7). 


Table 30-7 Likelihood Ratios of Headache and Nausea/Vomiting Are 

Not Highly Useful 

LR+ (95% Cl) 

LR- (95%CI) 

Headache 

1.1 (1.0-1.3)' 

0.43 (0.19-0.96) 1 

Nausea/vomiting 

1.3 (1.1-1.6) 1 

0.64 (0.44-0.92) 1 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio, LR—, negative 
likelihood ratio. 

Once meningitis is considered, clinicians should determine 
whether a patient requires cranial imaging before LP. Items 
from the medical history and physical examination that 
should be used include age greater than 60 years, immuno¬ 
compromised state, history of CNS disease, seizure within 1 
week of presentation, and neurologic abnormality (decreased 
level of consciousness, inability to answer questions and fol¬ 
low commands, gaze palsy, abnormal visual fields, facial palsy, 
abnormal motor function, and abnormal language). 

Prospective studies have failed to identify individual findings 
that are accurate enough to diagnosis meningitis. Jolt accentua¬ 
tion of the headache might be useful but requires validation in 
more studies. The absence of the classic triad of fever, neck stiff¬ 
ness, and headache does not rule out meningitis. 

REFERENCE STANDARD TEST 

Microbiologic culture. 
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TITLE Computed Tomography of the Head Before Lum¬ 
bar Puncture in Adults With Suspected Meningitis. 

AUTHORS Hasbun R, Abrahams J, Jekel J, Quagliarello VJ. 

CITATION NEngl JMed. 2001;345(24):1727-1733. 

QUESTION Can the absence of certain clinical features 
at baseline be used to identify adults with suspected men¬ 
ingitis who are unlikely to have abnormal findings on 
computed tomography (CT) head scan, particularly mass 
effect? 

DESIGN Prospective cohort study. 

SETTING Emergency department of Yale-New Haven 
Hospital, New Haven, Connecticut. 

PATIENTS Of 511 adults (>16 years) with clinically sus¬ 
pected meningitis potentially eligible between July 1995 
and June 1999, 301 were enrolled in the study. The 
remainder were excluded mainly because they were identi¬ 
fied too late, that is, after the CT or after discharge. The 
average patient was young (median age, 40 years), white 
(52%), and immunocompetent (75%). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A clinician or study investigator collected standardized base¬ 
line clinical characteristics before the lumbar puncture (LP) 
and CT. CT scans were interpreted blindly by 2 independent 
neuroradiologists; disagreements were resolved by a third 
neuroradiologist. Scans were categorized as normal, focal, or 
nonfocal abnormality and with or without mass effect. 

MAIN RESULTS 

Of the 301 patients, 235 underwent CT before LP. Fifty-six 
patients (24%) had a CT abnormality, of which only 11 (5%) 
had evidence of mass effect. The baseline characteristics asso¬ 
ciated with any abnormality on CT head scan included being 
older than 60 years, immunocompromised state, history of 
central nervous system disease, seizure within 1 week of pre- 


Table 30-8 Accuracy of Baseline Characteristic to Detect Any 
Abnormality on Computed Tomographic Head Scan 

Sensitivity, % Specificity, % LR+ LR- 

(95% Cl) (95% Cl) (95% Cl) (95% Cl) 

Any baseline 95 (85-98) 52 (45-59) ZO 040 

characteristic (1.7-2.3) (0.03-0.31) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

sentation, and neurologic abnormality. The neurologic 
abnormalities were a decreased level of consciousness, inabil¬ 
ity to answer questions and follow commands, gaze palsy, 
abnormal visual fields, facial palsy, abnormal motor function, 
and abnormal language. 

The accuracy of any of the above baseline characteristics 
for detecting any abnormality on CT is shown in able 30-8. 

9 presents the accuracy of any of the significant 
baseline characteristics to detect mass effect on CT head scan. 

CONCLUSION 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective data collection with appropriate 
blinding on consecutive patients. 

LIMITATIONS Most (78%), but not all, patients had CT 
before LP. 

For patients with suspected meningitis, it is common prac¬ 
tice in some centers to order a CT before proceeding to LP to 
detect any mass effect and avoid causing transtentorial herni¬ 
ation with an LP. A previous review 1 suggested that 


Table 30-9 Accuracy of Baseline Characteristic to Detect Mass Effect 
on Computed Tomographic Head Scan 

Sensitivity, % Specificity, % LR+ LR- 
(95% Cl) (95% Cl) (95% Cl) (95% Cl) 


Any baseline 91 (62-96) 42(36-49) 1.6 0.21 

characteristic (1.2-2.0) (0.03-1.4) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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1. the risk of herniation in the setting of increased intracra¬ 
nial pressure without obstruction to cerebrospinal fluid 
flow had been overstated; and 

2. CT was probably not necessary before proceeding to LP in 
patients without neurologic abnormalities or atypical fea¬ 
tures (such as being immunocompromised). 

First, the results indicate that the absence of any significant 
baseline characteristic (detailed above) is a reasonably strong 
indicator of a lack of mass effect on CT head scan. Only 1 of 11 
(9%) patients with mass effect was missed with these criteria, 
although the confidence interval (Cl) indicates that the true 
value may be as high as 38%. Given a pretest probability of mass 
effect in this study population of 5%, the absence of these char¬ 
acteristics reduces the posttest probability to 1.0% (95% Cl, 
0.16%-6.9%). 

Second, the risk of herniation after LP, even in the presence 
of mass effect on CT, is low. Of the 11 patients with mass 
effect, 7 went on to have LP anyway (including the 1 patient 
missed with the baseline criteria) and none had herniation at 
clinical follow-up 1 week later. 

Overall, the evidence suggests that the absence of specific 
findings on clinical history and neurologic examination can 
reasonably safely identify those who do not need CT before LP. 

REFERENCE FOR THE EVIDENCE 

1. Archer BD. Computed tomography before lumbar puncture in acute men¬ 
ingitis: a review of the risks and benefits. CMAJ. 1993;148(6):961-985. 
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TITLE The Diagnostic Accuracy of Kernig’s Sign, Brudz- 
inski’s Sign, and Nuchal Rigidity in Adults With Suspected 
Meningitis. 

AUTHORS Thomas KE, Hasbun R, Jekel J, Quagliarello 
VJ. 

CITATION Clin Infect Dis. 2002;35(l):46-62. 

QUESTION What is the accuracy of Kernig sign, Brudz- 
inski sign, and nuchal rigidity in adults with suspected 
meningitis? 

DESIGN Prospective cohort study. 

SETTING Emergency department of Yale-New Haven 
Hospital 

PATIENTS Two hundred ninety-seven patients present¬ 
ing to the emergency department between July 1995 and 
June 1999 with suspected meningitis (clinical symptoms 
compatible with meningitis) who underwent lumbar punc¬ 
ture. This study population was also used to evaluate the 
safety of lumbar puncture without computed tomographic 
(CT) head scan. 1 Of 301 patients who were enrolled, 4 were 
excluded because of mass effect on CT head scan. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

An emergency department physician recorded the clinical 
history and physical examination results before lumbar 
puncture. A patient was considered to have meningitis if the 
cerebrospinal fluid white blood cell count was greater than or 
equal to 6 cells/mL. 

MAIN RESULTS 

Eighty patients had meningitis and 217 did not. Seventeen 
percent of the entire cohort were immunocompromised. 

None of the history items were helpful in ruling in men¬ 
ingitis, although the absence of a headache or nausea and 
vomiting would decrease the probability of meningitis 
(Table 30-10). Neither a fever nor any of the maneuvers 
were accurate (Kernig or Brudzinski signs or nuchal rigid¬ 
ity) ( ible 30-11). 

CONCLUSION 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective, consecutive patients for whom 
the clinicians had a suspicion of meningitis. The physical 
examination was always done blinded to the results of the 
lumbar puncture. 


Table 30-10 History Items 




Finding (No.) 

Sensitivity, % 
(95% Cl) 

Specificity, % 
(95% Cl) 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Headache (282) 

92 (84-96) 

19(14-25) 

1.1 

(1.0-1.3) 

0.43 

(0.19-0.96) 

Nausea and 
vomiting (290) 

70 (59-79) 

47 (40-54) 

1.3 

(1.1-1.6) 

0.64 

(0.44-0.92) 

Neck stiffness 
(296) 

48 (37-59) 

55 (49-62) 

1.1 

(0.82-1.4) 

0.95 

(0.74-1.2) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

Table 30-11 Physical Examination 



Finding (No.) 

Sensitivity, % 
(95% Cl) 

Specificity, % 
(95% Cl) 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Fever (297) 

43 (32-53) 

48(41-55) 

0.82 

(0.62-1.1) 

1.2 

(0.94-1.5) 

Kernig sign (237) 

5(1.6-13) 

95(91-98) 

0.97 

(0.27-3.6) 

1.0 

(0.94-1.1) 

Brudzinski sign 
(236) 

5(1.6-13) 

95(91-98) 

0.97 

(0.26-3.5) 

1.0 

(0.94-1.1) 

Neck stiffness 
(297) 

30(21-41) 

68 (62-74) 

0.94 

(0.64-1.4) 

1.0 

(0.87-1.2) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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LIMITATIONS The manner in which the emergency depart¬ 
ment physicians performed the physical examination was not 
standardized, and not all patients were assessed for each 
physical examination maneuver. 

This study is of better quality than most others on the 
physical examination in meningitis. Unfortunately, the 
authors did not standardize the techniques for performing 
the physical examination maneuvers. As a result, almost 20% 
of the patients were not examined for Kernig or Brudzinski 
signs. The small proportion of patients who were immuno¬ 
compromised may have contributed to the lower accuracies 
of these findings because some of these patients may have 
been unable to mount an inflammatory response to central 
nervous system infection. 2 

Although the classic triad of fever, neck stiffness, and 
altered mental status was not directly addressed in this study, 


the very weak negative likelihood ratio associated with the 3 
findings individually (1.2, 1.0, and 0.97, respectively) casts 
doubt on our previous assertion in the original Rational 
Clinical Examination article that the absence of this triad vir¬ 
tually eliminates a diagnosis of meningitis. 

Reviewed by Rose Hataia, MD 
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CHAPTER 


CLINICAL SCENARIOS 


Is This Woman 

Perimenopausal? 

Lori A. Bastian, MD, MPH 
Crystal M. Smith, MD 
Kavita Nanda, MD, MHS 


Are These Women Perimenopausal? 

For each of the following cases, the clinician may need to 
determine the probability that the patient is perimenopausal. 

CASE 1 A 45-year-old woman who had a hysterectomy at 
age 42 years for uterine fibroids reports that she has hot 
flashes and has felt irritable for the past month. 

CASE 2 A 41-year-old woman tells her physician that she 
thinks she is starting menopause. She smokes 1 pack of 
cigarettes a day, as she has for the past 20 years. 

CASE 3 A 47-year-old woman who has been taking oral 
contraceptives for the past 25 years requests information 
about her menopausal status. She is sexually active and 
wants to know whether she needs to continue taking birth 
control medication. 


WHY IS THE DIAGNOSIS IMPORTANT? 


The question, “Is this woman perimenopausal?” is impor¬ 
tant for clinicians because patients ask and want to know 
whether they are undergoing a physical and emotional 
change and whether they are experiencing the menopausal 
transition. Physicians need information to identify peri¬ 
menopausal women, to be able to reply to women’s ques¬ 
tions about the changes they may be experiencing, and to 
offer counseling on symptom relief, contraception, and 
disease prevention. As women begin the perimenopausal 
years, clinicians should counsel them on strategies to pre¬ 
vent osteoporosis, as well as on evidence-based treatment 
options for climacteric symptoms such as hot flashes and 
night sweats. Clinicians commonly identify perimeno¬ 
pausal women by their ages, by inquiring about their men¬ 
strual histories and symptoms, and by ordering laboratory 
tests to examine hormone levels, such as follicle-stimulating 
hormone (FSH) and estradiol levels, to confirm their clinical 
suspicions. It would be useful to know how age, self-assessment, 
family and medical history, symptoms, physical signs, and 
laboratory tests affect the probability that the woman is 
perimenopausal. 

In this article, we intend to answer the following questions: 
What is the value of asking a woman whether she thinks she 
is starting menopause? How accurate are symptoms and 
signs in detecting perimenopause? Is there any value in ask¬ 
ing about family and medical history in determining meno¬ 
pausal status? Are laboratory tests more useful than clinical 
examination in diagnosing perimenopause? 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 
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PHYSIOLOGY AND DEFINITIONS 

Climacteric is a general term referring to the entire transition 
from the reproductive to the postreproductive interval in a 
woman’s life. 1 Thus, it includes immediate premenopausal, 
perimenopausal, and postmenopausal women. All women 
do not go through the same transition of regular menses to 
irregular menses to amenorrhea as they approach meno¬ 
pause. In 2001, a panel of experts (from the Stages of Repro¬ 
ductive Aging Workshop) met to discuss a staging system to 
classify reproductive aging. 2 This proposed new classification 
of the transition from reproductive to postmenopausal 
includes 7 stages, which are based on menstrual cycles and 
plasma FSH levels. The experts of this system observed that 
this is a work in progress, and it has not been validated in 
research settings. 

The World Health Organization 3 defines natural meno¬ 
pause as “the permanent cessation of menstruation, determined 
retrospectively after 12 consecutive months of amenorrhea 
without any other pathological or physiological cause.” 3 
Menstruation ceases as ovarian follicle stores are depleted 
and ovarian function is diminished, leading to eventual 
decreased production of estrogen by the ovary and 
decreased stimulation of the endometrial lining. 4 Analysis 
of longitudinal data of women at all ages shows a proba¬ 
bility of less than 2% for spontaneous menstruation after 
12 months of amenorrhea. 5 The accurate diagnosis of per- 
imenopause allows patients and physicians to predict the 
onset of menopause. 

Perimenopause refers to the year before the final menstrual 
period through the first year after the final menstrual period. 3 - 6 
During perimenopause, ovulation occurs irregularly because of 
fluctuations in the hormones of the hypothalamic-pituitary- 
ovarian axis. 6 For example, in early perimenopause, inhibin B 
levels decline, resulting in an increase in FSH levels, with no sig¬ 
nificant change in inhibin A or estradiol levels. FSH levels may 
increase during some cycles but return to premenopausal levels 
in subsequent cycles. Further complicating the determination 
of FSH concentration is the pulsatile pattern of secretion. Simi¬ 
larly, concentrations of estradiol also may decrease or even 
increase during perimenopause. 5 This hormonal variability cre¬ 
ates difficulties in interpreting a single laboratory test value. 
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Figure 31 -1 Prevalence of Perimenopause and Postmenopause by Age 

Median age at perimenopause is 47.5 years, and median age at postmeno¬ 
pause is 51.3 years. Adapted with permission from McKinlay et al. 9 Massa¬ 
chusetts Women’s Health Study (n = 5547). 


According to longitudinal data of women’s menstrual 
cycles, Brambilla et al 7 and Dudley et al 8 further refined the 
definition of perimenopause by considering a woman peri¬ 
menopausal if she has not had a period within the previous 3 
to 11 months or if she has experienced changes in menstrual 
regularity (either shortening or lengthening of time between 
menses) during the past 12 months. In a 5-year population- 
based study, Brambilla et al 7 found that 3 to 11 months of 
amenorrhea or irregular periods among women aged 45 to 
55 years were most predictive of menopause within the fol¬ 
lowing 3 years (sensitivity, 72%; specificity, 76%). Dudley et 
al 8 validated this definition, finding that these 2 characteris¬ 
tics are the best predictors of menopause 4 years after base¬ 
line (sensitivity, 32%; specificity, 99%). The perimenopausal 
definition by Brambilla et al 7 was used as our reference stan¬ 
dard for this systematic review. 

ESTIMATING THE PRETEST PROBABILITY 
OF PERIMENOPAUSE 

To determine a woman’s likelihood of perimenopause, clini¬ 
cians must first estimate the pretest probability of perimeno¬ 
pause. This estimate should be based primarily on the 
patient’s age, although certain aspects of the medical and 
family history also may be useful. 

In a 30-year study that enrolled college women and followed 
them throughout their lifetime until menopause, Treloar et al 5 
reported the mean age of onset of perimenopause as 45.5 years, 
with a mean duration of 6.2 years. According to 5-year follow¬ 
up data from a population-based study of 5547 women aged 45 
to 55 years, McKinlay et al, 9 in the Massachusetts Women’s 
Health Study (1992), reported the median age of onset of peri¬ 
menopause as 47.5 years, with a mean duration of 3.8 years. 
Figure 31-1 shows the prevalence of perimenopause and post¬ 
menopause according to age from McKinlay et al 9 data. Unfor¬ 
tunately, estimating the time of onset of perimenopause is 
difficult, and data were not available from the literature on the 
prevalence of perimenopause among women younger than 45 
years. McKinlay et al 9 reported that by age 45 years, 40% of all 
women have started or completed the menopause transition 
(32% are perimenopausal and 8% are postmenopausal). By age 
50 years, 75% of women have started or completed the transi¬ 
tion (38% perimenopausal and 37% postmenopausal). By age 
55 years, only 2% of women are premenopausal. 

EVALUATION OF PERIMENOPAUSE 

This evaluation can be divided into 5 basic categories: self- 
assessment, symptoms, family and medical history, physical 
signs, and laboratory tests. 

Self-Assessment 

Clinicians can ask a woman whether she thinks she is starting 
menopause. Women may base their perceptions of their meno¬ 
pausal status on awareness of the subtle changes taking place in 
their bodies. 10,11 In a cross-sectional study by Garamszegi et al, 10 
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self-reported menopausal status was more correlated with 
symptoms than menstrual cycle characteristics. 

Symptoms 

Climacteric symptoms typically include vasomotor com¬ 
plaints, such as hot flashes and night sweats. Other symptoms 
associated with perimenopause in cross-sectional studies are 
thought to be associated with fluctuating levels of estrogen and 
progesterone. These include vaginal dryness, variable sexual 
interest, urinary incontinence, depressed mood, nervous ten¬ 
sion and irritability, and sleep disturbances. 1 

Hot Flashes 

Hot flashes are sudden sensations of heat, sweating, and 
flushing that most often occur in the face, head, neck, and 
chest. Chills, clamminess, and anxiety also may accompany 
hot flashes. They generally last 1 to 5 minutes, though 6% of 
women experience hot flashes lasting longer than 6 minutes. 12 
Most North American, European, and Australian women 
report that they experience hot flashes (50%-85%) 9,12-14 and 
that they occur periodically during a span of 1 to 5 years. 15,16 
There appear to be cultural differences in the reporting or 
experiencing of hot flashes. For example, only 10% to 20% of 
Indonesian women 17 and 10% to 25% of Chinese women 18 
report experiencing them. The mechanism triggering these 
episodes is thought to be a combination of fluctuating estra¬ 
diol levels and a narrowing of the thermoneutral zone. 19 

Night Sweats 

Night sweats are hot flashes that occur at night, usually while 
the woman is sleeping. Often, she will awake drenched in 
sweat. If night sweats interfere with sleeping patterns, this may 
explain reports of insomnia, fatigue, and irritability among 
climacteric women. 

Vaginal Dryness 

Vaginal dryness is sometimes experienced as a result of 
decreasing estrogen production during perimenopause. This 
can lead to urogenital atrophy and changes in the quantity or 
composition of vaginal secretions. Estimates of the preva¬ 
lence of vaginal dryness among late perimenopausal women 
range from 18% 20 to 21%. 21 

Variable Sexual Interest 

Dennerstein et al 22 report in a study of Australian women 
that although most indicated no change in sexual interest 
during menopause, 31% experienced a decrease and 7% 
reported an increase in sexual interest. Only 6% of those 
reporting a decrease indicated menopause as a reason for the 
decline in interest. 22 This decrease may be caused by physio¬ 
logic factors making sexual relations more difficult (eg, vagi¬ 
nal dryness, hot flashes, urinary incontinence) or social and 
environmental factors. Several studies have found that 
menopausal symptoms are but one of many factors affecting 
sexual interest among women in midlife and later. 23,24 

Urinary Incontinence 

Urinary incontinence affects between 26% 25 and 55% 26 of 
middle-aged women from western countries and may be 
caused or exacerbated by declining estrogen levels. Lower 


estrogen levels can lead to atrophy of the urethral mucosa 
and the trigone, the muscle controlling urination, resulting 
in less urinary control. 6 Some studies have found an associa¬ 
tion between increased prevalence of urinary incontinence 
and menopause, 25 whereas others have not. 27,28 

Depressed Mood 

Avis et al 29 classified 10% of 45- to 55-year-old women partici¬ 
pating in a population-based longitudinal study of women from 
Massachusetts as experiencing clinical depression. Many studies 
do not find an association of menopause with depression or find 
that it can be explained by other menopausal symptoms. 29 ' 34 Evi¬ 
dence from North American 29 and British 35 cohorts found high 
rates of depression among perimenopausal women with a his¬ 
tory of depression, supporting the theory that women with 
previous affective disorders may be at an increased risk for 
recurrent depression. Conclusions from other reports have sug¬ 
gested that depression could be increased because of declines in 
estrogen levels, 36 changes in social circumstances, 37 and changes 
in self-concept as women lose reproductive function. 38 

Nervous Tension and Irritability 

Many symptom checklists for menopause symptoms used in epi¬ 
demiologic studies include nervous tension and irritability. 21,39 ' 42 
Although the relevance of these symptoms is unclear, they could 
be caused by lack of sleep because of menopausal symptoms, ill¬ 
ness, or stressful life events. Some authors suggest that they could 
result from changes in hormone levels, which also occurs during 
the 10- to 14-day luteal phase of the menstrual cycle. 43 

Family and Medical History 

Age of Mother’s Menopause 

Genetic factors seem to predispose women to menopause at an 
earlier age. 44,45 Torgerson et al 44 reported that women with pre¬ 
mature (<40 years) and early (<45 years) menopause report sig¬ 
nificantly younger maternal menopausal ages than did women 
with normal menopausal ages. In a case-control study of women 
from the greater Boston area, Cramer et al 45 found that women 
with a family history (eg, mother, sister, aunt, grandmother) of 
menopause before age 46 years had a higher risk of early meno¬ 
pause (odds ratio, 6.1; 95% confidence interval [Cl], 3.9-9.4). 

Cigarette Use 

Approximately 23% of US adult women smoke cigarettes regu¬ 
larly. 46 Evidence indicates that women who smoke experience 
menopause 1 to 2 years earlier than do nonsmokers. 2,47 ' 54 Cig¬ 
arette smoking reduces bioavailable estrogen by increasing 
hepatic metabolism of estrogen, 55,56 decreasing production of 
estrogen, 57,58 or increasing circulation of androgens. 59 Several 
studies support the assertion that quitting smoking can signifi¬ 
cantly delay menopause. 48,49 Other evidence suggests that the 
median age of menopause is not statistically different between 
women who have never smoked and ex-smokers. 60,61 Neverthe¬ 
less, a majority of research on cigarette smoking and menopause 
does indicate a dose-response relationship between number of 
cigarettes currently smoked and age at menopause. 48,49,53,62 Fur¬ 
thermore, Gold et al 20,54 observed that “past smoking and current 
smoking were positively associated with prevalence of vasomo¬ 
tor symptoms,” in agreement with most previous data. 54 
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Hysterectomy Status 

It is often assumed that women who have had a hysterec¬ 
tomy with conservation of the ovaries should not experi¬ 
ence menopausal symptoms earlier or more severely 
because of their hysterectomy. Nonetheless, evidence 
shows that women with ovarian conservation after hys¬ 
terectomy report more vasomotor complaints, vaginal 
dryness, and other complaints than do women of similar 
age who did not have a hysterectomy. 63 - 64 In developed 
countries, hysterectomy is one of the most frequent oper¬ 
ations in adult women 63 ; one-third of US women will 
have had a hysterectomy by age 65 years. 65 Hysterectomy 
may inhibit blood circulation in ovaries, decreasing ovar¬ 
ian function 64 and causing more frequent or severe meno¬ 
pausal symptoms. 

Physical Signs 

Maturation Index 

One proposed assessment of vaginal estrogen deficiency is an 
evaluation of the vaginal epithelium maturation index. This 
procedure involves obtaining cells from the junction of the 
upper and middle third of the lateral vaginal wall with a brush. 
These cells are prepared on a slide with the Papanicolaou tech¬ 
nique, and the percentages of parabasal, intermediate, and 
superficial cells are counted. 66 Although the maturation index 
changes significantly after estrogen replacement therapy, diag¬ 
nostic studies have not compared the maturation index with 
menstrual cycle characteristics. 

Vaginal pH 

Some investigators suggest that an increased vaginal pH (6.0- 
7.5) in the absence of potentially pathogenic bacteria may be 
a reasonable marker of decreased estradiol serum levels. 67 
This test is performed by directly applying pH paper to the 
lateral vaginal wall at the outer third of the vagina. Changes 
in pH can alter the composition of vaginal secretions that 
accompany atrophy. 

Skin Thickness 

Estrogen stimulates the epidermal growth rate and pro¬ 
motes the formation of collagen and hyaluronic acid, 
which increase the turgor and vascularization of the skin. 68 
During climacteric, declining estrogen levels result in the 
thinning and atrophy of the epidermis. 68 Investigators 
have proposed measuring skin thickness with ultrasonog¬ 
raphy at the greater trochanter area to estimate meno¬ 
pausal status, but this procedure has not been supported 
by research to date. 68 

Laboratory Tests 

Follicle-Stimulating Hormone 

Measurement of FSH plasma levels has been used to try to 
identify perimenopausal and postmenopausal women. 
High FSH levels indicate that menopausal changes are 
occurring in the ovary. As the ovary becomes less respon¬ 
sive to stimulation by FSH from the pituitary gland (and 
produces less estrogen), the pituitary gland increases pro¬ 
duction of FSH to try to stimulate the ovary to produce 


more estrogen (Figure 31-1). However, some clinicians and 
researchers doubt the clinical value of FSH measurements 
in perimenopausal women because FSH levels fluctuate 
considerably each month, depending on whether ovulation 
has occurred. 2 - 69 - 70 

Estradiol 

Recent longitudinal studies have reported that early peri¬ 
menopausal (change in cycle frequency) women maintained 
premenopausal estradiol levels, whereas late perimeno¬ 
pausal (no menses in previous 3-11 months) and postmeno¬ 
pausal women experienced significant declines in estradiol 
levels. 71 Estradiol can be measured using plasma, urine, and 
saliva. Like FSH, estradiol levels are highly variable during 
perimenopause. 1 

Inhibins 

Inhibin A and inhibin B are secreted by the ovaries and, like 
estradiol, exert negative feedback on the pituitary gland, 
reducing FSH and luteinizing hormone secretion (Figure 
31-1). Loss of inhibin contributes to the increase in FSH 
that occurs with ovarian senescence. A recent longitudinal 
study of hormone levels throughout the menopause transi¬ 
tion reported that inhibin B levels decline as women 
progress through perimenopause, whereas inhibin A levels 
remain unchanged. Inhibin A levels did decrease at approx¬ 
imately the final menstrual period. 71 Inhibin levels are usu¬ 
ally measured in plasma. The ovaries produce less inhibin B 
as fewer follicles proceed to maturation, and the number of 
follicles declines with age. 72 

METHODS 

Search Strategy and Quality Review 

We searched the MEDLINE database for English-language 
articles concerning the diagnosis of menopause that were 
published between 1966 and 2001. The key words used 
included “menopause, perimenopause, premenopause, cli¬ 
macteric, sensitivity” and “specificity, diagnosis, prospec- 
tive/cross-sectional studies, health status,” and “hormones 
of the hypothalamic-pituitary-ovarian axis.” We included 
articles that used the diagnosis of perimenopause based on 
menstrual irregularity or 3 to 11 months of amenorrhea, 
included a premenopausal control group, and presented 
data that could be extracted to calculate both sensitivity and 
specificity rates. We included articles on laboratory tests that 
are available to clinicians for 2 reasons. First, women may 
ask for laboratory tests to assess their menopausal status. 
Second, the results of the tests must be coupled closely with 
the clinical examination for proper interpretation. We 
excluded reviews and articles that included men, hormone 
replacement therapy (HRT), cancer, or osteoporosis as 
major foci of the papers. We developed the search strategy 
with a medical librarian, and this is available from the 
authors on request. Two authors (L.A.B. and C.M.S.) sys¬ 
tematically reviewed and identified titles and abstracts for 
content and quality. Articles using a definition of perimeno¬ 
pause different from 3 to 11 months of amenorrhea or irreg- 
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ular periods, those lacking a control group (a remote 
premenopausal group), and studies for which data could not 
be classified into contingency tables were excluded. Articles 
using a young control group (ie, 20-year-old women) or an 
older postmenopausal group (ie, 60- to 70-year-old women) 
or including women receiving HRT also were excluded. Two 
authors (L.A.B. and C.M.S.) abstracted the articles with a 
standardized abstraction form. Each publication was given a 
grade of A, B, or C according to the study design and level of 


evidence (see Table 1-7 for a summary of Evidence Grades 
and Levels). 73 Discrepancies about quality were resolved by a 
third author (K.N.). 

The MEDLINE search identified 1221 articles, and from 
the references cited in these and other publications known to 
us, another 25 articles were added to the review pool. Sixteen 

articles 10,11,20 ' 22 ' 26,28 ' 39,42,54 ’ 68,74 ' 78 met all the inclusion criteria 
described above and were included in the final analysis 

(Table 31-1). 


Table 31 -1 Studies Included in the Analysis 

Study 


Source, y 

Study 

Population 

Setting 

Study Design 

Age 

Range, y 

Premenopause, 

No. 

Perimenopause, 
No. (%) 

Symptoms and 
Signs Studied 

Quality 

Score 3 

Chompootweep et al, 74 
1993 

Thai women 

Health centers 

Cross- 

sectional 

45-59 

735 

292 (28) 

Hot flash, mood, 
insomnia 

A 

Dennerstein et al, 75 1993 

Australian 

women 

Population 

database 

Cross- 

sectional 

45-55 

316 

549 (63) 

Hot flash, mood, 
insomnia, nervous 
tension 

A 

Dennerstein et al, 22 1994 

Australian 

women 

Population 

database 

Cross- 

sectional 

45-55 

290 

504 (63) 

Sexual interest 

A 

Punyahotra et al, 76 1997 

Companions of 
outpatients in 
Thailand 

Outpatient 

clinic 

Cross- 

sectional 

40-59 

127 

22 (15) 

Hot flash, night sweat, 
mood, nervous tension 

B 

Burger et al, 71 1998 

Australian 

women 

Population 

database 

Prospective 

45-55 

28 

59 (68) 

Inhibins 

B 

Garamszegi et al, 10 1998 

Australian 

women 

Population 

database 

Prospective 

45-55 

91 

182 (67) 

Night sweat, self-rating 

B 

Stellato et al, 77 1998 

Massachusetts 

women 

Population 

database 

Cross- 

sectional 

50-60 

99 

179(64) 

FSH 

B 

Ho et al, 39 1999 

Chinese women 

Population 

database 

Cross- 

sectional 

44-55 

1258 

92(7) 

Hot flash, mood, 
insomnia 

B 

Kuh et al, 26 1999 

British women 
born in 1946 

Population 

database 

Prospective 

48 

480 

319(40) 

Incontinence 

A 

Dennerstein et al, 21 2000 

Australian 

women 

Population 

database 

Prospective 

45-55 

172 

254 (60) 

Vaginal dryness 

B 

Gold etal, 20 2000 

US ethnic 
communities 

Community 

(SWAN) 

Cross- 

sectional 

40-55 

4497 

4158(48) 

Vaginal dryness, 
insomnia, inconti¬ 
nence 

B 

Harlow et al, 11 2000 

US ethnic 
communities 

Community 

(SWAN) 

Cross- 

sectional 

40-55 

4234 

3928 (48) 

Self-rating 

B 

Bromberger etal, 42 2001 

US ethnic 
communities 

Community 

(SWAN) 

Cross- 

sectional 

40-55 

4483 

4143(48) 

Psychological distress 5 

B 

Gold etal, 54 2001 

US ethnic 
communities 

Community 

(SWAN) 

Cross- 

sectional 

40-55 

4514 

4173(48) 

Cigarette smoking 

B 

Maartens et al, 78 2001 

Dutch women 

Population 

database 

Cross- 

sectional 

47-54 

526 

1250 (70) 

Hot flash, night 
sweat, mood, insom¬ 
nia, nervous tension, 
vaginal dryness, 
incontinence 

B 

Sherburn et al, 28 2001 

Australian 

women 

Population 

database 

Prospective 

45-55 

471 

393 (45) 

Urinary incontinence 

A 


Abbreviations: FSH, follicle-stimulating hormone; SWAN, Study of Women’s Health Across the Nation. 
“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“Psychological distress is defined as depression, irritability, or nervous tension in the past 2 weeks. 
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Table 31-2 History, Symptoms, and Hormone Levels in the Prediction of Perimenopause 


Symptoms and Signs 

No. of Participants 

Sensitivity Range 

Specificity Range 

LR+ (95% Cl) or Range 8 

LR- (95% Cl) or Range 8 

Hot flashes 3974 - 76 ' 78 

5167 

0.22-0.55 

0.83-0.91 

2.2-4.1 

0.54-0.87 

Night sweats 10 ' 76 ' 78 

2198 

0.20-0.50 

0.74-0.87 

1.9 (1.6-2.2)° 

0.67-0.92 

Vaginal dryness 20 - 21 ' 78 

10857 

0.11-0.29 

0.80-0.97 

1.5-3.8 

0.92(0.91-0.94)° 

Incontinence 20 ' 26 - 28 ' 78 

12094 

0.16-0.39 

0.64-0.91 

1.1-1.7 

0.91 (0.89-0.93)° 

Depressed mood 39 - 74 - 76 ' 78 

5167 

0.09-0.47 

0.64-0.97 

1.3-3.1 

0.82-0.94 

Insomnia 20 ' 39 ' 74 - 75 ' 78 

13673 

0.21-0.53 

0.63-0.83 

0.98-2.1 

0.79-1.0 

Nervous tension or irritability 75 - 76 ' 78 

2790 

0.41-0.59 

0.51-0.68 

1.2 (1.1-1.3)° 

0.83 (0.77-0.90)° 

Psychological distress 42 

8626 

0.28 

0.79 

1.3 (1.2-1.4) 

0.91 (0.89-0.93) 

Sexual interest 22 

799 

0.25 

0.84 

1.6 (1.2-2.1) 

0.89 (0.83-0.96) 

Self-rating 10 - 11 

8435 

0.77-0.94 

0.39-0.64 

1.5-2.1 

0.18-0.36 

Current smoking 54 

8185 

0.24 

0.82 

1.3 (1.2-1.4) 

0.93(0.91-0.95) 

FSH 77 (>24 mlU/L) 

278 

0.65 

0.79 

3.1 (2.1-4.5) 

0.45 (0.36-0.56) 

Inhibin A 71 (<1.28 U/L) 

87 

0.61 

0.54 

1.3(0.84-2.0) 

0.73 (0.46-1.2) 

Inhibin B 71 (<30 ng/L) 

87 

0.46 

0.78 

2.0 (0.96-4.4) 

0.70 (0.51-0.96) 

IR-INH 71 (<30 ng/L) 

87 

0.07 

0.96 

1.9(0.22-16) 

0.97 (0.88-1.1) 


Abbreviations: Cl, confidence interval; FSH, follicle-stimulating hormone; IR-INH, immunoreactive inhibin; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
a LR+ is a measure of how well a positive result rules in perimenopause and an LR- measures how well a negative test result rules out perimenopause. Where one of 
these operating characteristics was homogeneous (P > .05 for the x 2 test), the summary value and a 95% Cl are given. Where they are heterogeneous, only the range 
is given. 

for LRs, a summary measure is reported only when more than 2 studies were identified and found to be homogeneous; otherwise, a range was reported. 


Statistical Methods 

We calculated values and CIs for sensitivity, specificity, posi¬ 
tive likelihood ratios (LRs+), and negative likelihood ratios 
(LRs-), using statistical software (SAS version 8.0; SAS Insti¬ 
tute Inc, Cary, North Carolina). Perimenopause is the target 
condition, and the reference standard is based on the defini¬ 
tion by Brambilla et al. 7 

The LR+ (sensitivity/[1 - specificity]) is a measure of how 
weh a positive test result rules in perimenopause, whereas the 
LR- ([1 - sensitivity]/specificity) is a measure of how weh a 
negative test result rules out perimenopause. An LR close to 1 
does not appreciably predict the likelihood of perimenopause. 
An LR greater than 1 increases the likelihood of perimeno¬ 
pause, whereas an LR less than 1 decreases the likelihood of 
perimenopause. We assessed sensitivity, specificity, LR+, and 
LR- for homogeneity. When the % 2 statistic suggested homoge¬ 
neity (P > .05), we combined the data to produce a random- 
effects estimate. 79 For heterogeneous data, variables are given 
as ranges. 


RESULTS 

Findings that were similar across studies (Table 31-2), that is, 
those that had the greatest LR+ and were therefore best at 
ruling in perimenopausal status were hot flashes (LR+, 2.2- 
4.1), night sweats (LR+, 1.9; 95% Cl, 1.6-2.2), and vaginal 
dryness (LR+, 1.5-3.8). The absence of findings was not effi¬ 
cient at ruling out perimenopausal status; self-rating (LR-, 
0.18-0.36) and hot flashes (LR-, 0.54-0.87) had the smallest 


LR-. Only 1 study each reported enough data to calculate 
sensitivity, specificity, and LRs for FSH and the inhibins 71,77 ; 
no study reported enough data to calculate these values for 
estradiol. High FSH levels (>24 mlU/L) and low inhibin B 
levels (<30 ng/L) provided weak evidence to rule in peri¬ 
menopause (LR+, 3.1; 95% Cl, 2.1-4.5; and LR+, 2.0; 95% 
Cl, 0.96-4.4, respectively). However, neither normal FSH 
levels nor normal inhibin B levels could rule out perimeno¬ 
pause (LR-, 0.45; 95% Cl, 0.36-0.56; and LR-, 0.70; 95% Cl, 
0.51-0.96, respectively). 


CLINICAL SCENARIOS—RESOLUTIONS 


Case 1 describes a 45-year-old woman with a moderately 
high pretest probability of being perimenopausal or post¬ 
menopausal (40%) according to her age (Figure 31-1) and 
probably even higher because she has had a hysterectomy 
and is experiencing climacteric symptoms. Because she has 
reported hot flashes (LR +, 2.2-4.1) and irritability (LR +, 
1.2; 95% Cl, 1.1-1.3), the calculated posttest probability of 
her being perimenopausal ranges from 40% to 100%. Our 
recommendation would be to not order FSH or other labo¬ 
ratory tests but to tell her that she is perimenopausal, to 
counsel her on increasing her calcium intake, and advise 
her to increase exercise to prevent osteoporosis. 

In case 2, a 41-year-old woman might have a pretest 
probability of being perimenopausal or postmenopausal 
(estimate, 10%) according to her age. Because she is cur¬ 
rently smoking cigarettes (LR+, 1.3; 95% Cl, 1.2-1.4), this 
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aspect of her medical history raises her probability of 
being perimenopausal to 12% to 14%. Because she 
thinks she is starting menopause (LR+, 1.5-2.1), her cal¬ 
culated probability of being perimenopausal might be 
further increased to 18% to 30%. We would inquire 
about her menstrual patterns and presence of climac¬ 
teric symptoms, tell her she may be close to perimeno- 
pause, and discuss contraception and smoking cessation 
with her. 

Finally, the 47-year-old woman in case 3 has a differ¬ 
ent type of question. Her pretest probability of being 
perimenopausal or postmenopausal is high (50%) 
according to her age (Figure 31-1). In this patient, we 
cannot assess for menstrual patterns or menopausal 
symptoms because she is using an oral contraceptive. 
Although she is likely perimenopausal, ovulation may 
still be possible. 80 If she desires to continue with her 
oral contraceptive, the American College of Obstetri¬ 
cians and Gynecologists recommends discontinuing 
this therapy between the ages of 50 and 55 years. 81 As 
suggested in case 1, we would counsel her on increasing 
calcium intake and increasing exercise for osteoporosis 
prevention. 


CONCLUSION 

No single element of the medical history or clinical exam¬ 
ination is powerful enough to confirm the probability of 
being perimenopausal. Besides menstrual history, the 
most powerful predictor of menopausal status is a 
woman’s age. The median age at perimenopause is 47.5 
years, 9 and 87% of women are perimenopausal or post¬ 
menopausal by the age of 51 years. The clinical question of 
perimenopausal status is more difficult in patients in their 
early to middle 40s. Many clinicians rely on the measure¬ 
ment of hormone levels, such as FSH, to confirm the diag¬ 
nosis. In the clinical scenarios we evaluated, FSH measurement 
did not help the clinician make a diagnosis. Further research 
needs to be conducted to document the additional benefit 
of these hormone level tests in making a diagnosis of peri¬ 
menopause. 
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CLINICAL SCENARIO 


A 42-year-old female patient presents to your clinic with 
concern about whether she is starting menopause. She 
does observe that her periods are lasting longer (approxi¬ 
mately 8-9 days), but they continue to occur at regular 
intervals, every 28 days. Her mother started menopause at 
age 48 years. The patient has noticed no symptoms of 
menopause such as hot flashes or night sweats. She does 
not smoke. She ordered a home-testing menopause kit via 
the Internet, and the results suggested that she is “starting 
menopause.” She wants to know about the accuracy of 
these kits and what type of changes she should expect dur¬ 
ing the next year. 

UPDATED SUMMARY ON PERIMENOPAUSE 

Original Review 

Bastian LA, Smith C, Nanda K. Is this woman perimeno- 
pausal? JAMA. 2003;289(7):895-902. 

UPDATED LITERATURE SEARCH 

Details of the Update 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the sub¬ 
ject “perimenopause,” published in English from 2002 to Sep¬ 
tember 2004. The results yielded 499 titles, for which we 
reviewed the titles and abstracts; 36 articles were selected for 
additional review. These articles were reviewed to identify 
studies that assessed the sensitivity and specificity of the 
medical history or physical examination features of peri¬ 
menopause, defined as greater than 3 (but < 12) months of 
amenorrhea or menstrual irregularity. Only 2 articles on the 
perimenopause were retained. 1 - 2 The remaining articles did 
not measure perimenopause or presented mean values that 
could not be used in 2 x 2 tables. 

Many women use home-testing kits to assess their meno¬ 
pausal status, making the results of home testing part of 
the clinical history. A Google search revealed 13100 sites 
for “menopause diagnostic kits,” yet there are no reports of 
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the effectiveness of these kits. We summarized the results of 
this search strategy after exploring the Web sites of avail¬ 
able kits. 


NEW FINDINGS 

The effectiveness of menopause home diagnostic kits (based 
on urine tests of follicle-stimulating hormone [FSH]) has not 
been published. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

Age is an important factor for perimenopause, and the Wise 
et al 1 article measured incidence of perimenopause among 
women aged 36 to 45 years. 

CHANGES IN THE REFERENCE STANDARD 

• The reference standard, which is based on menstrual his¬ 
tory, remains the same: 3 to 11 months of amenorrhea or 
irregular periods. 

• As noted in the original review, a panel of experts (from 
the Stages of Reproductive Aging Workshop) proposed a 
new system to classify reproductive aging that uses age, 
menstrual history, and FSH and estradiol levels. 3 The sys¬ 
tem is of uncertain validity because large categories of 
women, such as cigarette smokers and obese women, are 
excluded. More recently, the Women’s Ischemia Syndrome 
Evaluation (WISE) study developed a new algorithm for 
classifying menopausal status. 4 The apparent advantage of 
the new staging system is the ability to diagnose perimeno¬ 
pause in women who have had a hysterectomy. Using an 
expert consensus panel as the reference standard, WISE’s 
hormonal algorithm had a sensitivity of 88% and specific¬ 
ity of 97% for diagnosing perimenopause. 

RESULTS OF LITERATURE REVIEW 

Among women aged 45 to 55 years, self-rating of any decline 
in personal health has no predictive value for identifying 
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Table 31-3 The Incidence of Perimenopause Related to Age 

Age, y 

Incidence Rate of Perimenopause (%) 

36-37 

19/90 (21) 

38-39 

34/162 (21) 

40-41 

45/137 (33) 

42-43 

53/153(35) 

44-45 

26/61 (43) 



Table 31-4 Likelihood Ratios of Features tor Perimenopause 


Development of Perimenopause 
Within 36 Months 


Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Nonwhite 

2.5 (1.3-4.8) 

0.9 (0.9-10) 

Family history of early menopause 

2.0(11-3.5) 

0.9(0.9-10) 

BMI > 30 

1.8(11-2.8) 

0.9(0.9-10) 

Age, y 

42-45 

1.4 (1.1.5) 

0.8 (0.7-0.9) 

40-45 

13(12-15) 

0.6 (0.5-0.8) 

>5 y passive smoke exposure 

14(11-18) 

0.8 (0.7-1.0) 

Current smoker 

14(0.9-2.3) 

0.9(0.9-10) 

Depression (DS/W-/// defined) 5 

12(10-16) 

0.9(0.8-10) 

Decline in self-rated health 

12(0.6-2.7) 

0.9(0.8-11) 


Abbreviations: BMI, body mass index; Cl, confidence interval; DSM-III, Diagnostic and 
Statistical Manual of Mental Disorders (Third Edition); LR+, positive likelihood ratio; 
LR-, negative likelihood ratio. 

perimenopause (likelihood ratio [LR] approaches l). 2 The 
incidence of perimenopause among women 35 to 40 years of 
age is approximately 20% (Tal 3); a family history of 
early menopause in the mother has an LR of 2.0 for identify¬ 
ing younger women (age 36-45 years) who might become 
perimenopausal during the ensuing 36 months ( Re 31 ).' 


Menopause home diagnostic kits are popular in the lay 
health literature. These tests measure FSH levels, and results 
are considered “positive” when the FSH is elevated and in a 
menopausal range. The accuracy of these tests compares the 
test result to laboratory-based FSH measures, and they are 
reviewed by the Food and Drug Administration (FDA). The 
home testing kits approved by the FDA have over 90% accu¬ 
racy for the home test result compared to a test result 
obtained in a laboratory (see http://www.fda.gov/cdrh/orid/ 
homeuse-menopause.html; accessed June 3, 2008). 


EVIDENCE FROM GUIDELINES 

The US Preventive Services Task Force recommends counsel¬ 
ing women approaching the menopausal transition. The evi¬ 
dence report did not address the diagnosis of the menopausal 
transition. 


CLINICAL SCENARIO—RESOLUTION 


This scenario describes a 42-year-old woman with no 
symptoms but with minor changes in her menstrual 
flow. According to her age, the pretest probability of 
being either perimenopausal or postmenopausal is 35%. 
Her menopause home diagnostic kit result was positive, 
which may suggest an FSH level greater than or equal to 
25 mlU/mL (corresponding positive LR, 3.1). Although 
clinicians should rely on the medical history and demo¬ 
graphic features to assess menopause without routine 
FSH testing, this patient has provided a home test 
result. With the information from her home diagnostic 
kit, her calculated posttest probability of perimeno¬ 
pause is 62%. She might experience more irregularity of 
her periods during the next 1 to 2 years, with resulting 
amenorrhea. 











































CHAPTER 31 Menopause 


PERIMENOPAUSE —MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The probability of menopause is estimated best from the 
patient’s age (see Figure 31-1). 

At age 36 to 39 years, the incidence is approximately 
20%; 40 to 43 years, approximately 34%; and 44 to 45 
years, 43%. 

POPULATION FOR WHOM PERIMENOPAUSE 
SHOULD BE CONSIDERED 

• Women with irregular menses or amenorrhea for more 
than 3 months 

• Women with hot flashes or night sweats 

• Those who have had hysterectomy 

• Those undergoing chemotherapy 

FINDINGS FOR PERIMENOPAUSE 

Most findings other than age have low accuracy for iden¬ 
tifying women in perimenopause. The presence of a fam¬ 
ily history of early menopause or hot flashes and the 
results of a home test for FSH may be the best findings 

(Table 31-5). 


Table 31-5 Likelihood Ratios of Findings for Perimenopause 

Development of Perimenopause Within 36 Months 

Finding 

LR+ (95% Cl) or Range 

LR- (95% Cl) or Range 

FSH (>24 mlU/mL) a 

3.1 (2.1-4.5) 

0.45 (0.36-0.56) 

Family history of early 
menopause 

2.0 (1.1-3.5) 

0.9 (0.9-1.0) 

Hot flashes 

2.1-4.1 

0.54-0.87 


Abbreviations: Cl, confidence interval; FSH, follicle-stimulating hormone; LR+, positive 
likelihood ratio; LR-, negative likelihood ratio. 

“The role of routine hormonal testing for diagnosing perimenopausal status has not 
been established. The FSH may prove most useful for women after a hysterectomy 
because they cannot report menstrual symptoms. Home testing kits approved by the 
Food and Drug Administration have over 90% accuracy for the home test result com¬ 
pared to a test result obtained in a laboratory. 

REFERENCE STANDARD TEST 

Menstrual history. 
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EVIDENCE TO 


SUPPORT THE UPDATE: 
Menopause 



TITLE Predictors of Declining Self-rated Health During 
the Transition to Menopause. 

AUTHORS Dennerstein L, Dudley EC, Guthrie JR. 

CITATION /Psychosom Res. 2003;54(2):147-153. 

QUESTION What is the role of declining self-rated 
health in the diagnosis of perimenopause? 

DESIGN All data were collected prospectively in an 8- 
year cohort study called the Melbourne Women’s Midlife 
Health Project. Self-rated health was measured annually. 

SETTING Population-based cohort of middle-aged (aged 
45-55 years at baseline) Australian-born women. 

PATIENTS Two hundred sixty-two women completed the 
year 8 self-rated health assessment article; 136 women were 
perimenopausal (3-11 months of amenorrhea) and 44 were 
in the premenopausal control group. Exclusions were incom¬ 
plete data, surgical menopause, and hormone therapy use. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

On a mailed questionnaire, women were asked to rate their 
present health compared with that of other women about the 
same age as follows: worse than most, about the same as oth¬ 
ers, or better than most. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios. 

MAIN RESULTS 

Women’s perception of a decline in their health does not 
indicate they are perimenopausal ( :>le 31-6). 

Table 31-6 Likelihood Ratio for Self-Rated Decline in Health as a 
Predictor for Perimenopausal State 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

Decline in self- 0.20 0.84 1.2 (0.6-2.7) 0.9 (0.8-1.1) 

rated health 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Well-respected cohort study with measures of 
self-rated health completed prospectively. Perimenopause 
was determined independently. 

LIMITATIONS None. 

There is a low sensitivity for decline in self-rated health as a 
predictor of change to perimenopause. A decline in self-rated 
health does not identify women who are perimenopausal. 

Reviewed by Lori A. Bastian, MD 


TITLE Lifetime Socioeconomic Position in Relation to 
Onset of Perimenopause. 

AUTHORS Wise LA, Krieger N, Zierler S, Harlow BL. 

CITATION / Epidemiol Community Health. 2002;56( 11): 
851-860. 

QUESTION What is the association between demo¬ 
graphic, behavioral, and reproductive factors and onset of 
perimenopause? 

DESIGN All data were collected prospectively in a 
cohort study designed to assess the association between 
major depression and ovarian function among women of 
late reproductive age. 

SETTING A mailed questionnaire to a random sample 
of 6228 women aged 36 to 45 years, residing in 7 Boston- 
area communities from 1996 to 1997 (94% white). 

PATIENTS Seven hundred thirty-three women (81% 
response rate) completed the follow-up survey. Women 
were excluded if they were pregnant, had a hysterectomy 
or surgical menopause, had menopausal irregularity at 
the baseline survey, had medical menopause, underwent 
fertility therapy, or started hormone therapy. 
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CHAPTER 31 Evidence to Support the Update 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Baseline demographic and reproductive characteristics were 
measured, such as age, race/ethnicity, family history of early 
menopause (defined as mother or sister with natural meno¬ 
pause before age 46 years), smoking history, body mass index, 
passive smoke exposure, and depression defined by the Struc¬ 
tured Clinical Interview applied to Diagnostic and Statistical 
Manual of Mental Disorders (Fourth Edition) criteria. 1 

Perimenopausal status was measured during a 36-month 
follow-up by subjective report of menstrual irregularity or an 
absolute change of 7 days or greater in menstrual cycle length 
compared with baseline, a change in menstrual flow amount, 
or periods of amenorrhea lasting 3 to 6 months. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, likelihood ratios (LRs), and incidence 
of perimenopause by age categories. 

MAIN RESULTS 

Of 603 women, 177 (29%) developed perimenopause during 
the 36-month follow-up period. Twenty percent of women 
ages 36 to 39 years were perimenopausal ( ile 31-7). Base¬ 
line demographic, family history, smoking history, and the 
presence of depression were not particularly useful for iden¬ 
tifying perimenopause ( able 31-8). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Large cohort study with all variables entered 
prospectively. 


Table 31-7 Incidence of Perimenopause Related to Age 

Age, y Incidence Rate of Perimenopause (%) 

36-37 19/90(21) 

38-39 34/162 (21) 

4041 45/137 (33) 

4243 53/153 (35) 

44-45 26/61 (43) 


Table 31-8 Likelihood Ratio of Features for Perimenopause 


Development of 
Perimenopause 
Within 36 Months 


Finding 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Nonwhite 

0.10 

0.96 

2.5 (1.3-4.8) 

0.9 (0.9-1.0) 

Family history 
early menopause 

0.12 

0.94 

2.0 (1.1-3.5) 

0.9 (0.9-1.0) 

BMI >30 

0.16 

0.91 

1.8 (1.1-2.8) 

0.9 (0.9-1.0) 

Age.y 

42-45 

0.45 

0.68 

1.4 (1.1-1.7) 

0.8 (0.7-0.9) 

40-45 

0.70 

0.47 

1.3 (1.2-1.5) 

0.6 (0.5-0.8) 

>5 y passive 
smoke exposure 

0.37 

0.74 

1.4 (1.1-1.8) 

0.8 (0.7-1.0) 

Current smoker 

0.13 

0.91 

1.4 (0.9-2.3) 

0.9 (0.9-1.0) 

Depression 
(Z7SM-/1/ defined) 1 

0.37 

0.70 

1.2 (1.0-1.6) 

0.9 (0.8-1.0) 


Abbreviations: BMI, body mass index; Cl, confidence interval; DSM-IV, Diagnostic 
and Statistical Manual of Mental Disorders (Fourth Edition); LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 


LIMITATIONS Perimenopause definition required only 3 to 
6 months of amenorrhea. Estimates of perimenopause are 
more liberal than previous studies requiring 3 to 11 months 
of amenorrhea. 

The most important finding of this study is that the inci¬ 
dence rates for perimenopause among women 36 to 45 years 
of age may be higher than is appreciated by many generalist 
physicians and their patients. 

Women with menstrual irregularity at baseline were 
excluded from this cohort study. Therefore, the results should 
not be used to estimate the probability that a woman with 
ongoing menstrual irregularity is actually perimenopausal. 
Although it may seem awkward to use LRs to describe the util¬ 
ity of these demographic features and historical items, the val¬ 
ues can be applied to women aged 36 to 45 years and express 
the increased likelihood of developing perimenopause. 

Reviewed by Lori A. Bastian, MD 

REFERENCE FOR THE EVIDENCE 

1. Spitzer RL, Williams JB, Gibbon M, et al. The structured clinical inter¬ 
view for DSM-III-R (SCID), I: history, rational, and description. Arch 
Gen Psychiatry. 1992;49(8):624-629. 
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CLINICAL SCENARIO 


CHAPTER 


Does This Patient Have 

Aortic Regurgitation? 

Niteesh K. Choudhry, MD 
Edward E. Etchells, MD, MSc 


You are asked to see a 59-year-old woman with liver cirrhosis 
and esophageal varices. When she was checked into the clinic, 
she had a pulse pressure of 70 mm Hg. Because of the wide 
pulse pressure, you wonder if she has aortic regurgitation 
(AR). You conduct a complete physical examination and hear 
no early-diastolic murmur in the third or fourth intercostal 
spaces at the left sternal border. You suspect that the wide 
pulse pressure is a peripheral hemodynamic consequence of 
cirrhosis, not AR. Do you need an echocardiogram to con¬ 
firm your clinical impression that she does not have AR? 


WHY IS THE CLINICAL EXAMINATION 
IMPORTANT IN EVALUATING FOR 
AORTIC REGURGITATION? 


Aortic regurgitation is a potentially serious cardiac abnormal¬ 
ity that may be caused by important underlying disorders. 
Patients with AR require careful clinical monitoring to identify 
the optimal time for surgical intervention. Asymptomatic 
patients with severe AR may benefit from vasodilator therapy. 1 

The use of noninvasive cardiac testing, such as echocardi¬ 
ography, has increased in recent years. It is estimated that 2% 
of the general population undergo noninvasive cardiac diag¬ 
nostic evaluation annually. 2 If a careful clinical examination 
can exclude the presence of AR, then there would be no need 
to proceed with further cardiac evaluation. 

Anatomic and Physiologic Origins of Diastolic Murmurs 

The cardinal manifestation of AR is a diastolic murmur. Dia¬ 
stolic murmurs are important indicators of structural cardiac 
abnormalities or pathologic states of increased flow (Table 
32-1). As discussed in a previous article in this series, 3 heart 
murmurs are produced when turbulent blood flow causes 
prolonged auditory vibrations of cardiac structures. The 
intensity of the murmur depends on many factors, including 
blood viscosity, blood flow velocity and turbulence, the dis¬ 
tance between the vibrations and the stethoscope, the angle 
at which the vibrations meet the stethoscope, the transmis¬ 
sion qualities of the tissue between the vibration and the 
stethoscope, and the auditory skills of the examiner. 4 

How to Examine for Aortic Regurgitation 

A complete clinical history and physical examination are 
essential in the evaluation of a patient with a diastolic mur¬ 
mur. A diastolic murmur in a patient with renal failure and 
volume overload will have different significance than a dia¬ 
stolic murmur in a patient with a history of rheumatic fever 
and atrial fibrillation. 

The examiner’s ability to detect a diastolic murmur can be 
undermined by environmental factors such as noisy rooms, 
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CHAPTER 32 The Rational Clinical Examination 


Table 32-1 Selected Causes of Diastolic Murmurs 

Abnormal cardiac structure 
Aortic regurgitation 
Mitral stenosis 
Pulmonic regurgitation 
Tricuspid stenosis 
Atrial myxoma 
Ventricular septal defect 3 
Atrial septal defect 3 
Mitral regurgitation 3 

Normal cardiac structure, increased flow 
Renal failure with volume overload 
Thyrotoxicosis 
Anemia 
Sepsis 

“Diastolic murmurs are caused by abnormally increased diastolic flow across the 

mitral or tricuspid valves. 



Figure 32-1 Typical Location of Abnormal Diastolic Murmurs 

There are 3 important areas to auscultate for diastolic murmurs. Area 1 is the 
second and third intercostal spaces at the right sternal border. Area 2 is the 
second and fourth intercostal spaces at the left sternal border. Aortic regurgi¬ 
tation murmurs may be heard in both areas 1 and 2. If the murmur is loudest 
in area 1, then the underlying cause of aortic regurgitation may be an 
ascending aortic aneurysm or aortic dissection. Pulmonic regurgitation mur¬ 
murs are loudest in the superior part of area 2 and may radiate downward. 
The murmur of mitral stenosis and the Flint murmur of aortic regurgitation 
are best heard at the apex (area 3). 


examiner factors such as fatigue or haste, and patient factors 
such as dyspnea or tachycardia. 5 If examining conditions are 
not optimal, the examination should be repeated when con¬ 
ditions improve. 

The precision and accuracy of many components of the 
examination for AR, including all of the cardiac history and 
most of the physical examination, have not been adequately 
evaluated. This article will focus on aspects of the cardiac 
physical examination that have been sufficiently assessed for 
precision or accuracy. 


Cardiac Auscultation 

During routine auscultation, the examiner attempts to detect 
a diastolic murmur. Diastole is the period that begins with 
the closure of the aortic and pulmonic valves (second heart 
sound [S 2 ]) and ends with the closure of the mitral and tri¬ 
cuspid valves (first heart sound [Sj]). A common maneuver 
used to identify diastole is to palpate the carotid artery pulse 
during auscultation; S L is synchronous with the carotid 
artery pulsation, whereas S 2 follows the pulse. A diastolic 
murmur is a diastolic sound longer than a heart sound. 
Examiners should describe the grade, location of maximal 
intensity (Figure 32-1), timing (Figure 32-2), duration, pitch, 
and radiation of the murmur. 

The Levine grading system, 6 with slight modifications, 7 was 
developed for systolic murmurs but may also be used to 
describe diastolic murmurs. A grade 1 murmur is not heard 
immediately on auscultation but is heard after the examiner 
focuses for a few seconds. Grade 2 murmurs are heard imme¬ 
diately on auscultation but are softer than the loud grade 3. 
Grade 4 murmurs are associated with a palpable precordial 
vibration called a thrill. Grades 5 and 6 murmurs are also asso¬ 
ciated with a thrill. A grade 5 murmur is audible when only 
one edge of the stethoscope is on the chest, and a grade 6 mur¬ 
mur is audible with the entire stethoscope lifted off the chest. 

The typical AR murmur is an early-diastolic, decrescendo 
blowing sound (Figure 32-2) that may be accentuated with 
the patient sitting upright and leaning forward. 8 In some 
cases, S 2 can be obscured by the murmur. Most AR murmurs 
are high pitched and are best heard with the diaphragm of 
the stethoscope placed firmly on the chest wall. Some AR 
murmurs are low pitched and are better heard with the bell 
of the stethoscope placed lightly on the chest wall. For exam¬ 
ple, the AR murmur associated with endocarditis and a 
fenestrated aortic valve can be low pitched. 

The examiner should apply the stethoscope to the chest 
wall in the third or fourth intercostal space at the left sternal 
border and listen between normal breaths at the end of expi¬ 
ration. The patient should not voluntarily breath-hold 
because it may inadvertently create a Valsalva maneuver. If 
the murmur is louder at the second to third right intercostal 
space, the underlying cause of AR may be an ascending aortic 
aneurysm or aortic dissection. 9 

Aortic regurgitation also may be associated with a systolic 
murmur, 10 created by the flow of an abnormally large volume 
of blood through a nonstenotic aortic valve or a bicuspid 
aortic valve. The murmur is an early-peaking, crescendo- 
decrescendo systolic sound that is best heard with the dia¬ 
phragm of the stethoscope applied to the second right inter¬ 
costal space. 

The Flint murmur is a low-pitched late-diastolic apical mur¬ 
mur, which is associated with AR. The murmur is likely pro¬ 
duced when the regurgitant jet of blood collides with the left 
ventricular endocardium. 11 The murmur may have a mid¬ 
diastolic component, but the original description by Flint 12 
referred only to “presystolic blubbering.” 12 It is best heard with 
the patient in the left-lateral decubitus position, using the bell 
of the stethoscope. Differentiating the Flint murmur from the 





























CHAPTER 32 Murmur, Diastolic 


murmur of mitral stenosis can be difficult. The murmur of 
mitral stenosis is primarily mid-diastolic (possibly with a late- 
diastolic component) and may be associated with an opening 
snap (OS) and a loud Sj (Figure 32-2). 13 

The typical murmur of pulmonic regurgitation (PR) is an 
early-diastolic decrescendo murmur heard best in the second- 
left intercostal space at the sternal border. The murmur may 
radiate to the third and fourth left intercostal spaces and may 
increase during quiet inspiration. If there is splitting of S 2 , 
the astute examiner may note that the murmur begins after 
the pulmonic valve component (P 2 ) of S 2 rather than the 
aortic component. The murmur of PR may be lower pitched 
than the murmur of AR, unless pulmonary hypertension is 
present. A right-sided Flint murmur can be heard, particu¬ 
larly in patients with pulmonary hypertension. 

Mitral stenosis is associated with a mid-diastolic, decre¬ 
scendo, low-frequency rumble, which, if the patient is in 
sinus rhythm, may be followed by late-diastolic (presystolic) 
crescendo that ends with the mitral component of S 1 (Figure 
32-2). It is best heard using the bell of the stethoscope placed 
at the apex soon after moving the patient into the left lateral 
decubitus position. Rolling the patient onto the left side 
brings the left ventricle closer to the chest wall and serves as a 
form of exercise, increasing blood flow across the mitral valve 
and increasing the murmur’s intensity. 9 The murmur of 
mitral stenosis may be inaudible in patients with low cardiac 
output. 

The Si may be increased in intensity in mitral stenosis. 13 A 
normal Sj is best appreciated near the apex, where it should be 
louder than S 2 . The S 1 is normally softer than S 2 in the second 
right and second left intercostal spaces adjacent to the ster¬ 
num. If is as loud as or louder than the S 2 in these areas, 
then the Sj is increased in intensity. 

An OS is a high-frequency, early-diastolic sound that is 
associated with the opening of a stenotic mitral valve. It 
occurs 50 to 100 ms after the aortic valve component (A 2 ) of 
S 2 and is best heard in the area from the left sternal border to 
the apex. Much like the murmur of mitral stenosis, it may be 
accentuated by auscultating while the patient is in the left 
lateral decubitus position shortly after the patient has per¬ 
formed exercise. The A 2 -OS interval shortens with increas¬ 
ing severity of mitral stenosis. The OS may be absent in the 
case of a heavily calcified immobile mitral valve. It is often 
difficult to differentiate an OS from the P 2 of S 2 . The OS 
usually decreases in intensity with inspiration and S 2 -OS 
interval widens on standing. Conversely, P 2 becomes louder 
with inspiration, and the A 2 -P 2 interval remains the same or 
narrows with standing. 13 In addition, P 2 is not expected to 
be heard at the apex unless the patient has pulmonary 
hypertension. 

Maneuvers 

Selective use of maneuvers can enhance the detection and 
interpretation of diastolic murmurs. There is no point in 
doing maneuvers if a loud AR murmur has been detected 
during routine auscultation. However, if the clinician is 
unsure about the presence of a faint diastolic murmur, then a 
maneuver that increases murmur intensity may clarify the 



Figure 32-2 Selected Features of Diastolic Murmurs 

Diastolic murmurs are classified according to the time of onset of the mur¬ 
mur. 14 An early diastolic murmur begins with the second heart sound (S 2 ). 
Top, Early diastolic murmurs typically decrease in intensity (decrescendo) 
and disappear before the first heart sound (Si). In some cases, an early 
diastolic murmur can continue through diastole. Bottom, A mid-diastolic 
murmur begins clearly after S 2 (in mitral stenosis, classically after an 
opening snap [OS]). A late-diastolic (or presystolic) murmur begins in the 
interval immediately before S-|. In mitral stenosis, the mid-diastolic mur¬ 
mur may merge with the late-diastolic (presystolic) murmur. 


situation. If the clinician has a heightened suspicion for AR 
(eg, after hearing an aortic ejection sound), or if examining 
conditions are not optimal, then a maneuver to augment 
murmur intensity might bring out an otherwise inaudible 
murmur. Finally, the maneuvers may help distinguish PR 
from AR. In this latter situation, the clinician should listen 
where the murmur is just barely audible, so that it is easy to 
detect a decrease or increase in murmur intensity during the 
maneuver. 

Quiet inspiration increases venous return and augments 
right-sided heart murmurs such as PR. To determine the 
effect of inspiration on the intensity of the murmur, the 
examiner should listen during quiet inspiration, rather than 
asking the patient to breathe deeply, because the murmur 
may be obscured by breath sounds. 

Transient arterial occlusion primarily increases systemic 
arterial resistance that intensifies left-sided regurgitant 
lesions such as AR and may help distinguish the murmur 
from PR. To perform this maneuver, sphygmomanometers 
are placed around both of the patient's arms and are inflated 
to 20 to 40 mm Hg above the previously recorded systolic 
blood pressure. Any changes in murmur intensity are noted 
20 seconds after cuff inflation. 15 

Peripheral Hemodynamic Signs 

There are a variety of peripheral hemodynamic signs tradi¬ 
tionally associated with AR. Some of these signs have been 
adequately evaluated, including de Musset head-bobbing 
sign, 16 a wide pulse pressure, 17 the brachial-popliteal pulse 
gradient (Hill sign 18 ), Duroziez femoral murmur, 16 the femo¬ 
ral pistol shot murmur, 13 and Corrigan water hammer 
pulse. 19 The de Musset head-bobbing sign consists of a for¬ 
ward shaking of the head with every heartbeat; it is best 
observed in patients who are sitting. 16 

Pulse pressure refers to the difference between systolic and 
diastolic blood pressures. A widened pulse pressure may be 
defined as greater than 50 mm Hg. 20 Other definitions 
include a pulse pressure greater than 50% of the systolic 
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pressure. 17 Determination of the blood pressure has been 
described in another article in this series. 21 

The brachial-popliteal pulse gradient (Hill sign) can be 
defined as a systolic blood pressure in the lower extremities that 
is at least 20 mm Hg higher than that in the arms. 20 To determine 
a popliteal blood pressure, an appropriately sized blood pressure 
cuff should be placed on the patient’s thigh 13 with the artery 
marker over the popliteal artery. The cuff should be inflated and 
the systolic pressure can then be determined in the popliteal 
fossa either by palpation, as judged by the point where the pulse 
reappears as the cuff is deflated, or by auscultation, listening for 
Korotkoff sounds to appear. Both the brachial and popliteal 
blood pressures should be measured while the patient is supine. 
The average of repeated readings should be used, especially in 
patients with irregular heart rates, such as atrial fibrillation. 

Duroziez double intermittent femoral bruit is elicited by 
first gently compressing the femoral artery with the diaphragm 
of the stethoscope. This will yield a systolic bruit in all patients. 
As increasing pressure is applied to the diaphragm, an early- 
diastolic bruit will become audible in patients with AR. While 
listening to the diastolic bruit, the clinician should tilt the 
stethoscope so that the distal rim (closest to the patient’s feet) 
is compressing the femoral artery. If the bruit becomes louder 
with this maneuver, then the diastolic bruit is due to the retro¬ 
grade flow of blood toward the heart in AR. The stethoscope 
should then be tilted such that the proximal rim (closest to the 
patient's head) is compressing the femoral artery. If the dia¬ 
stolic bruit becomes softer, this can be taken as supportive evi¬ 
dence of the presence of retrograde blood flow. If, however, the 
bruit becomes louder with proximal pressure (and softer with 
distal pressure), then this sign should not be used as evidence 
of AR but may indicate the presence of a high-flow state such 
as renal failure with volume overload. 22 

Femoral pistol shot sounds are elicited by auscultating with 
the diaphragm of the stethoscope over the femoral arteries. A 
high-pitched pistol shot sound may be heard in AR. Corrigan 
water hammer pulse refers to an increased volume and rate of 
increase of the radial pulse when the wrist is elevated perpen¬ 
dicular to the body of a supine patient. The radial pulse should 
first be assessed while the patient is lying supine with his or her 
arms resting at the sides. Sufficient pressure should be applied 
to obliterate the pulse. While this pressure is maintained, the 
patient's arm should be elevated so that it is perpendicular to 
the plane of the body. In AR, the pulse will become palpable 
despite applying an equivalent amount of pressure as when the 
arm was at the patient’s side. 

Other peripheral hemodynamic signs, such as Mayne sign 
(a decrease in diastolic blood pressure of 15 mm Hg when 
the arm is held above the head compared with when the arm 
is held at the level of the heart), 23 Quinke capillary pulsation, 
Muller pulsatile uvula, and Rosenbach liver pulsation, have 
not been adequately evaluated for precision or accuracy. 

METHODS 

To identify articles pertaining to the precision and accuracy 
of the physical examination for AR, we used standard meth¬ 


ods for conducting research overviews. 24 Our data collection 
strategy involved 3 steps and was deliberately broad to reduce 
the possibility of overlooking important articles. First, we 
searched MEDLINE for English-language articles published 
from 1966 through July 1997, using a structured search strat¬ 
egy (available on request from the authors). Second, we man¬ 
ually reviewed potentially relevant articles and their reference 
lists. Third, we contacted the authors of relevant studies for 
additional information. Studies were excluded if they were 
review articles, involved patients younger than 18 years, were 
small (ie, <20 participants), involved prosthetic heart valves, 
had no clinical examination performed or reported, or had 
no acceptable reference standard (Doppler echocardiography 
or cardiac catheterization). 

Studies were independently reviewed for methodologic 
quality by the 2 authors, and disagreements were resolved by 
consensus. Quality grades were assigned using published 
guidelines (see Table 1-7 for a summary of Evidence Grades 
and Levels). 25 Grade A studies involve the independent com¬ 
parison of a sign or symptom with a reference standard of 
diagnosis among a large number of consecutive patients sus¬ 
pected of having the target condition. Grade B studies meet 
the criteria for grade A studies but have a small number of 
patients. Grade C studies involve nonconsecutive patients, 
patients who are known to have the target condition and 
healthy individuals, nonindependent comparisons between 
the sign or symptom and the reference standard, or noninde¬ 
pendent comparisons with a reference standard of uncertain 
validity. Grade C studies tend to overestimate the accuracy of 
the sign or symptom. 

We created contingency tables for all studies and deter¬ 
mined the likelihood ratios (LRs) for aortic regurgitation. 26 ' 27 
We also sought information on the examination for other 
causes of diastolic murmurs, such as mitral stenosis or PR. 
Unfortunately, we found few studies of sufficient method¬ 
ologic quality for these conditions. This relative lack of infor¬ 
mation implies that methodologically sound studies are 
needed but does not imply that the clinical examination for 
these conditions is imprecise, inaccurate, or unimportant. 

Precision of the Examination Related 
to Diastolic Murmurs 

Precision refers to agreement regarding a particular clinical 
finding between different physicians (interobserver) or between 
multiple assessments by the same physician (intraobserver). 
The precision of the clinical examination for diastolic mur¬ 
murs has been evaluated in usual clinical situations by auscul¬ 
tating patients 28,29 or in controlled nonclinical circumstances 
by listening to recorded audiotapes (Table 32-2). 30 

There have been 4 studies that address the interobserver preci¬ 
sion of cardiac auscultation to detect diastolic murmurs (Table 
32-2). Although simple agreement is high in these studies, the 
one study for which it was possible to calculate agreement 
adjusted for chance (k) showed only moderate agreement. The 
experience of observers likely affects precision. The one study 28 
that compared cardiologists with noncardiologists found a 
higher simple agreement for cardiologists. 
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The interobserver agreement between examiners on the 
intensity of heart sounds is excellent (92%). 29 In this study, 
examiners progressively inserted 0.5-mm-thick paper disks 
between the patient's chest and the stethoscope. The total 
thickness of the disks was used as a measure of heart sound 
intensity. Murmur intensity was also assessed with this 
technique (Table 32-2). 

The Bottom Line for Precision 

The interobserver precision of cardiologists examining for 
any diastolic murmur is moderate with audiotapes (k = 0.51) 
and good in the clinical setting (simple agreement, 94%). 
Noncardiologists may be less precise than cardiologists. 
The precision of examining for the intensity of murmurs 
and heart sounds with a standardized series of paper 
disks to assess intensity is good (simple agreement, 92%- 
96%). 

Accuracy of the Examination for Aortic Regurgitation 

We consider Doppler echocardiography and cardiac catheter¬ 
ization to be acceptable reference standards for AR (Table 32-3). 
In one study, 37 the reference standard was open-heart surgery. 

Cardiologists conducted the clinical examinations in most 
studies. Too few studies, using few patients, allow for reason¬ 
able estimates of the accuracy of noncardiologists, although 
noncardiologists are likely less adept at detecting the diastolic 
murmur of AR. Approximately 20% of residents and medical 
students correctly identified the murmur of AR on high- 
fidelity digitized audiotapes, 31 whereas 46% of internal medi¬ 
cine residents correctly identified an AR murmur on a 
patient simulator. 32 

The best-studied physical finding is the typical early- 
diastolic murmur of AR. 33 ' 46 If an examiner does not hear a 
typical AR murmur, then the likelihood that the patient has 
moderate or greater AR is significantly reduced (negative like¬ 
lihood ratio [LR-], 0.1 for grade A studies); the likelihood of 
mild or greater AR is also significantly reduced (LR-, 0.2-0.3 
for grade A studies). If an examiner hears the typical AR mur¬ 
mur, the likelihood that the patient has moderate or greater 
AR is increased (positive likelihood ratio [LR+], 4.0-8.3 for 
grade A studies); the likelihood of mild or greater AR is also 
significantly increased (LR+, 8.8-32 for grade A studies). 33,34 

The intensity of the murmur correlates with the severity of 
echocardiographic AR. Desjardins et al 47 studied 40 patients 
with echocardiographic AR, including 17 with severe AR. A 
grade 3 diastolic murmur had an LR of 4.5 (95% Cl, 1.6-14) 
for distinguishing severe AR from less severe AR, whereas a 
grade 2 murmur had an LR of 1.1 (95% Cl, 0.5-2.4), a grade 
1 murmur had an LR of 0.0 (95% Cl, 0.0-0.9), and absence 
of a diastolic murmur had an LR of 0.0 (95% Cl, 0.0-1.1). 47 

Two grade C studies of the Flint murmur and some periph¬ 
eral hemodynamic findings are reported in Table 32-3. Grade 
C studies tend to overestimate diagnostic test accuracy. Despite 
this tendency, one study found that absence of a Flint murmur 
did not rule out AR (LR-, 0.5-0.8). 48 Another study of patients 


Table 32-2 Interobserver Reliability (Precision) for Detecting 

Diastolic Murmurs 

Finding 

Type of 
Examiner 

No. of 
Examiners 

No. of 
Patients 

K 0 

Simple 

Agreement, 

% 

Murmur 
absent vs 

Cardiologists 

(tapes) 30 

5 

100 

0.51 

79 

present 

Cardiologists 28 

2 

32 


94 


Noncardiolo¬ 

gists 28 

3 

32 


78 

Intensity of 
murmur 

Not stated 2911 

5 

25 


92 


“Ellipses indicate data not available. 

"Examiners used paper disks, 0.5 mm in thickness, that were progressively inserted 
between the chest wall and the stethoscope until the murmur became inaudible. The 
total thickness of the disks used was used as the measure of intensity. For example, if 
a murmur was inaudible after insertion of 3 disks, then this was a 1,5-mm murmur. 


with mild to severe AR found only that a wide pulse pressure 
or peripheral hemodynamic sign (Duroziez bruit, femoral pis¬ 
tol shots, and Corrigan pulses) was not helpful for distinguish¬ 
ing mild AR from moderate or severe AR. 20 The de Musset 
head-bobbing sign was seen in only 1 of 20 patients (sensitiv¬ 
ity, 5%), while Duroziez femoral bruit was observed in 8 of 
12 patients (sensitivity, 67%), 16 making them interesting but 
not particularly useful findings. 

THE BOTTOM LINE FOR AORTIC REGURGITATION 

When a cardiologist hears the typical murmur of AR, the like¬ 
lihood of mild or greater AR is increased significantly (2 grade 
A studies). The absence of a typical diastolic murmur signifi¬ 
cantly reduces the likelihood of AR (2 grade A studies). Non¬ 
cardiologists may be less proficient than cardiologists at 
detecting the murmur of AR. 

Mitral Stenosis and Pulmonic Regurgitation 

In one grade A study of 529 unselected nursing home resi¬ 
dents (31 with mitral stenosis), a cardiologist detected a 
mid-diastolic murmur in all cases of mitral stenosis, with 
no false-positive or -negative examinations. 49 Only 1 patient 
had an audible OS. 

Noncardiologists may be less proficient at detecting the 
physical findings of mitral stenosis. Less than 10% of residents 
and medical students correctly identified a mid-diastolic mur¬ 
mur of mitral stenosis on a high-fidelity digitized audiotape, 31 
whereas 43% of medical residents identified a mid-diastolic 
murmur of mitral stenosis with a patient simulator. In the lat¬ 
ter study, only 21% identified the OS of mitral stenosis. 32 

The only evaluated element of the clinical examination for 
PR is the presence of a typical diastolic decrescendo murmur 
best audible in the second intercostal space at the left-upper 
sternal border, which may increase in intensity with quiet 
inspiration. All studies used cardiologists as examiners and 
were of poor methodologic quality (grade C). 
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Table 32-3 Accuracy of the Physical Examination for Detecting Aortic Regurgitation 





Study, y 

Patient Population 

No. of Patients 

Reference Standard With AR 

LR+ (95% Cl) 

LR- (95% Cl) 

Quality 

Grade 3 

Typical Murmur With Severity of AR Specified 

Aronow and Kronzon 33 (1989) 

Elderly patients 

Echocardiography (n = 450) 




A 



Mild or greater AR 

131 

32 (16-63) 

0.2 (0.1-0.3) 




Moderate or greater AR 

74 

8.3(6.2-11) 

0.1 (0.0-0.2) 


Grayburn et al 34 (1986) 

Referred for 

Catheterization (n = 106) 




A 


catheterization 

Mild or greater AR 

82 

8.8 (2.8-32) 

0.3 (0.2-0.4) 




Moderate or greater AR 

57 

4.0 (2.5-6.9) 

0.1 (0.1-0.3) 


Roldan et al 35 (1996) 

Asymptomatic connec- 

Echocardiography (n = 143) 




C 


tive tissue disease and 
controls 

Mild or greater AR 

10 

80 (14-470) 

0.4 (0.2-0.7) 



Moderate or greater AR 

5 

69 (18-270) 

0.0 (0.0-0.6) 


Rahko 36 (1989) 

Referred for 

Echocardiography (n = 403) 




C 


echocardiogram 

Mild or greater AR 

134 

27 (13-60) 

0.4 (0.3-0.5) 




Moderate or greater AR 

82 

12(8.1-19) 

0.2 (0.1-0.3) 


Cohn et al 37 (1967) 

Mitral valve repair 

Open-heart surgery (n = 156) 




C 



Mild or greater AR 

50 

5.2 (3.3-8.4) 

0.3 (0.2-0.4) 




Moderate or greater AR 

37 

3.9 (2.6-5.7) 

0.2 (0.1-0.4) 


Meyers et al 38 (1982) 

Referred for 

Catheterization (n = 75) 




C 


aortography 

Mild or greater AR 

66 

3.3(1.3-12) 

0.4 (0.2-0.7) 




Moderate or greater AR 

39 

1.6 (1.2-2.4) 

0.4 (0.2-0.7) 


Dittmann et al 39 (1987) 

Valvular heart disease 

Catheterization (n = 55) 




C 5 



Mild or greater AR 

42 

16(2.1-155) 

0.4 (0.3-0.6) 




Severe AR 

19 

3.6 (2.1-6.6) 

0.1 (0.0-0.4) 


Meyers et al 40 (1985) 

Valvular heart disease 

Catheterization (n = 20) 




C 



Mild or greater AR 

11 

9.8(1.3-96) 

0.5 (0.2-0.9) 




Moderate or greater AR 

3 

5.7(1.4-14) 

0.0 (0.0-0.9) 


Linhart 41 (1971) 

Mitral stenosis 

Catheterization (n = 28) 




C 



Mild or greater AR 

11 

6.2(1.9-23) 

0.3(0.1-07) 




Moderate or greater AR 

7 

7.0 (2.5-20) 

0.0 (0.0-1.3) 


Typical Murmur Without AR Severity Specified (May Include Trivial AR) 

Come et al 42 (1986) 

Mitral valve prolapse, 
plus patients with sys¬ 
tolic flow murmurs 

Echocardiography (n = 165) 

7 

90 (8-982) 

0.7 (0.4-0.9) 

c 

Nienaber et al 43 (1993) 

Clinically suspected 
aortic dissection 

Echocardiography (n = 110) 

32 

33(9.4-120) 

0.2 (0.1-0.3) 

0 

Ward et al 44 (1977) 

Clinically suspected 
aortic dissection 

Catheterization (n = 65) 

49 

13(2.9-75) 

0.2 (0.1-0.3) 

0 

Esper 45 (1982) 

AR and other heart 
disease 

Echocardiography (n = 43) 

24 

12(2.4-67) 

0.4 (0.3-0.7) 

c 

Saal et al 46 (1985) 

Mitral stenosis 

Catheterization (n = 45) 

35 

8.0 (1.9-45) 

0.2 (0.1-0.4) 

c 

Maneuver 

With transient arterial occlu¬ 
sion murmur increases in 
intensity 15 

Patients with AR, mitral 
stenosis, and pulmonic 
regurgitation 

Catheterization or echocardi¬ 
ography (n = 16) 

10 

8.4 (1.3-81) 

0.3 (0.1-0.8) 

c 

Associated Physical Findings 

Flint murmur 48 

Isolated AR and 

Echocardiography (n = 36) 




c 


controls 

Mild or greater AR 

28 

4 (0.5-40) 

0.8 (0.6-1.3) 




Moderate or greater AR 

13 

25 (2.8-243) 

0.5 (0.2-0.7) 



( continued) 


424 
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Table 32-3 Accuracy of the Physical Examination for Detecting Aortic Regurgitation (Continued) 


Study, y 

Patient Population 

Reference Standard 

No. of Patients 
With AR 

LR+ (95% Cl) 

LR- (95% Cl) 

Quality 

Grade 3 

Any systolic murmur 48 

Isolated AR and 

Echocardiography (n = 36) 




C 


controls 

Mild or greater AR 

28 

1.3 (0.9-2.7) 

0.5 (0.2-1.6) 




Moderate or greater AR 

13 

1.5 (1.0-2.1) 

0.0 (0.0-1.0) 


Popliteal-brachial gradient 

Mild to severe AR 

Catheterization (n = 33) 




C 

> 20 mm Hg 20 


Moderate or greater AR 

28 

8.2(1.5-78) 

0.2 (0.1-0.5) 


Peripheral hemodynamic 

Mild to severe AR 

Catheterization (n = 34) 




C 

signs™ 


Moderate or greater AR 

28 

2.1 (0.3-22) 

0.8 (0.7-1.7) 


Pulse pressure > 50 mm Hg 20 

Mild to severe AR 

Catheterization (n = 33) 




C 



Moderate or greater AR 

28 

1.0 (0.7-2.2) 

0.9 (0.2-5.5) 



Abbreviations: AR, aortic regurgitation; Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“Grade B study, except catheterization results were not interpreted independently of clinical findings. 

“Grade B study, except echocardiograms were not interpreted independently of clinical findings. 

“included Duroziez bruit, femoral pistol shots, and Corrigan pulses. 


When a cardiologist hears the murmur of PR, the likelihood 
of PR increases (LR+, 17 in both studies), but the absence of a 
PR murmur was not helpful for ruling out PR (LR, 0.9 in both 

studies). 36,42 

The Bottom Line for Mitral Stenosis 
and Pulmonic Regurgitation 

The presence of a mid-diastolic murmur significantly 
increases the likelihood of mitral stenosis, whereas the 
absence of a mid-diastolic murmur significantly reduces the 
likelihood of mitral stenosis (1 grade A study). When a cardi¬ 
ologist hears a typical PR murmur, the likelihood of PR 
increases significantly. The absence of a typical murmur does 
not alter the likelihood of PR (2 grade C studies). Noncardi¬ 
ologists may be less proficient at detecting the mid-diastolic 
murmur of mitral stenosis. 

Diastolic Murmurs in Patients With Renal Failure 

Diastolic murmurs caused by abnormal flow states, rather 
than abnormal cardiac structure, may be associated with a 
variety of conditions. Renal failure with volume overload is 
the only abnormal flow state associated with diastolic mur¬ 
murs that has been evaluated. 

Up to 9% of patients with end-stage renal disease have dia¬ 
stolic murmurs, particularly when these patients also have vol¬ 
ume overload, anemia, and hypertension. 50 These murmurs 
typically disappear after the treatment of volume overload, as 
was demonstrated in 2 small studies (grade C). 50,51 These mur¬ 
murs are probably due to transient pulmonary hypertension 
and dilatation of the pulmonary artery root, leading to PR. 51 

THE BOTTOM LINE FOR DIASTOLIC MURMURS 
IN PATIENTS WITH RENAL FAILURE 

Although there is an insufficient amount of data on which to 
make rigorous recommendations, if an early-diastolic mur¬ 


mur is heard in a dialysis patient with volume overload, the 
patient should be reexamined after treatment because the 
murmur may disappear. 

When to Examine for Aortic Regurgitation 

There are no evaluative data on which to base a recommen¬ 
dation regarding when to examine for AR. Undetected AR 
maybe common in elderly persons: 13% (n = 552) of asymp¬ 
tomatic elderly Finnish persons had moderate or severe 
echocardiographic AR. 52 Unfortunately, that study does not 
indicate how many of these patients had audible diastolic 
murmurs. Audible diastolic murmurs may be relatively 
uncommon findings in asymptomatic persons. In one study, 
only 1% (n = 103) of elderly asymptomatic nursing home 
residents had an audible diastolic murmur. 53 

Despite the lack of evaluative data, we think that a prudent 
clinician will examine for AR in most clinical settings. AR is a 
serious cardiac abnormality, which may be caused by underly¬ 
ing disorders and may be asymptomatic. The clinician’s suspi¬ 
cion for AR may be heightened by evidence of systemic 
disease, such as ankylosing spondylitis, a peripheral hemody¬ 
namic finding (although these are by no means indicative of 
underlying AR), or an abnormality detected during routine 
auscultation (such as an aortic ejection sound). Other findings 
may suggest different cardiac abnormalities associated with 
diastolic murmurs, such as evidence of pulmonary hyperten¬ 
sion (for PR), a wide-fixed split S 2 (for atrial-septal defect), or 
a holosystolic apical murmur (for mitral regurgitation). 

Recommendations for Further Research 

Most studies used cardiologists to conduct clinical examina¬ 
tions. There are some data that suggest that noncardiologists 
may be less accurate then cardiologists, so studies evaluating 
techniques to improve the skills of noncardiologists are 
needed. There are also no studies defining the optimal exam¬ 
ination technique for detecting the AR murmur. 
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CLINICAL SCENARIO—RESOLUTION 


Your patient has a wide pulse pressure but no typical early- 
diastolic murmur. The likelihood of mild or moderate AR is 
significantly reduced by the absence of a typical early-diastolic 
murmur (LR-, 0.1-0.3; 2 grade A studies). You perform tran¬ 
sient arterial occlusion, and no diastolic murmur appears, 
which enhances your confidence (LR-, 0.3). You are confident 
in your assessment because it was conducted in a quiet room 
with a comfortable and cooperative patient. Therefore, AR is 
unlikely and echocardiography is not necessary. 
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UPDATE: Murmur, Diastolic 



Prepared by David Cescon, MD, and Edward Etchells, MD 
Reviewed by Eugene Oddone, MD 


CLINICAL SCENARIO 


A 58-year-old man presents for a routine physical examina¬ 
tion, not having visited a physician in many years. He 
denies any cardiovascular symptoms. On auscultation, you 
are surprised to hear a loud (grade 3) early diastolic mur¬ 
mur. There is an audible S 3 and a collapsing radial pulse 
(Corrigan sign). After explaining to the patient that you 
heard a murmur, he asks, “How bad do you think it is?” 

ORIGINAL REVIEW 

Choudhry NK, Etchells EE. Does this patient have aortic 
regurgitation? JAMA. 1999;281(23):2231-2238. 

UPDATED LITERATURE SEARCH 

Our literature search combined the parent search strategy for 
The Rational Clinical Examination with the terms “diastolic 
and murmur, aortic valve insufficiency, mitral valve stenosis,” 
and “pulmonary valve insufficiency,” limited to English- 
language publications in the Ovid MEDLINE database from 
1997 to July 2004. The titles and abstracts of the search results 
were screened, case reports were excluded, and 8 potentially 
relevant articles were retrieved and reviewed. We manually 
reviewed the reference list of each article for additional stud¬ 
ies. Articles were retained if they were studies of adult partici¬ 
pants, included sensitivity and specificity data of physical 
findings, and had a quality score of level 3 or greater. This 
yielded 1 new study, and we found 1 other study during the 
updated literature search for systolic murmurs. 

NEW FINDINGS 

The presence of an S 3 in patients with isolated aortic regurgi¬ 
tation (AR) predicts severity. 

Details of the Update 

Were There Changes in the Original Publication? 

In the original article, the need to identify patients at higher 
risk for endocarditis because of valvular abnormalities was 
suggested as a rationale for performing the clinical examina¬ 
tion. The recommendations for endocarditis prophylaxis 


have changed. Patients with murmurs from structural abnor¬ 
malities of a native valve do not automatically require antibi¬ 
otic prophylaxis to prevent infective endocarditis. 1 

CHANGES IN THE REFERENCE STANDARD 

The reference standard is the echocardiogram or the results 
from a cardiac catheterization that assess valvular competency. 

RESULTS OF THE LITERATURE REVIEW 

Diastolic murmurs are always important, requiring ascertain¬ 
ment of the underlying abnormality. Most studies of the 
detection of AR assess the performance of cardiologists or the 
ability to distinguish patients with serious AR from those 
with less significant impairment. The sensitivity and specific¬ 
ity of a variety of peripheral hemodynamic findings, popular¬ 
ized in many textbooks of physical diagnosis and cardiology, 
have not been adequately assessed. 

Since the original review, 1 study 2 assessed the ability of 
cardiologists to identify the presence of AR (Table 32- ). In 
this study, 100 consecutive patients referred for evaluation of 
a systolic murmur of unknown cause underwent a standard 
cardiac examination by a cardiologist, who described the 
murmur and assigned a clinical diagnosis. Mild or greater AR 
was identified with high specificity but low sensitivity. Com¬ 
pared with studies cited in the original review, the lower sen¬ 
sitivity might reflect a challenging sample with a high 
prevalence of multiple valvular lesions and a predominance 
of mild regurgitation among those patients with AR. 


Table 32-4 Likelihood Ratios of the Physical Examination for Detecting 
Aortic Regurgitation 


Patient Population 

LR+ (95% Cl) 

LR- (95% Cl) 

Overall cardiac 
examination 2 

Referred for evalua¬ 
tion of systolic murmur 

5.1 (1.4-19) 

0.82 (0.67-1.0) 

Third heart 
sound (to identify 
severe AR) 3 

Patients with isolated 
aortic insufficiency, 
referred for echocardi¬ 
ography 

5.9 (1.4-25) 

0.83 (0.73-0.95) 


Abbreviations: AR, aortic regurgitation; Cl, confidence interval; LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 
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One additional study 3 evaluated the ability of the clinical 
examination to distinguish severe AR from less severe dis¬ 
ease. The presence of an S 3 , recorded by a physician referring 
a patient for cardiac ultrasonography, predicted severe AR 
(defined as a regurgitant fraction > 40%), with a likelihood 
ratio of 5.9. The absence of an S 3 was not useful for ruling 
out severe AR (negative likelihood ratio, 0.83) (Table 32-4). 3 

EVIDENCE FROM GUIDELINES 

The American College of Cardiology and American Heart Asso¬ 
ciation guidelines 4 (2003) state that Doppler echocardiography 
to exclude valvular regurgitation in asymptomatic patients with 
normal physical examination results is not indicated. 


CLINICAL SCENARIO—RESOLUTION 


Your patient has a typical diastolic murmur of AR. This 
finding alone warrants echocardiography because it is 
highly suggestive of an underlying abnormality. If the 
patient has AR, the grade 3 intensity of the murmur and 
the third heart sound increase the likelihood of severe AR. 
The collapsing radial pulse (Corrigan pulse) is of uncer¬ 
tain usefulness. Putting together all of your findings, you 
advise your patient that you are concerned that he might 
have important valvular disease. You provide information 
regarding endocarditis prophylaxis and arrange for an 
echocardiogram. 


AORTIC REGURGITATION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

One study of randomly selected elderly (75-86 years old) 
Finnish persons found a 29% prevalence of mild or 
greater AR. 5 Evaluation of more than 3000 men and 
women (aged 54 ± 10 years) in the Framingham heart 
study detected AR of trace or greater severity in 13.0% of 
men and 8.5% of women. 6 Increasing age was associated 
with higher prevalence of AR. 

POPULATION FOR WHOM THE SIGNS 
SHOULD BE EVALUATED 

• Any patient undergoing cardiac auscultation 

A variety of medical and traumatic conditions are associ¬ 
ated with AR: 

• Rheumatic fever 

• Endocarditis 

• Conditions associated with aortic valve leaflet abnor¬ 
malities (eg, Marfan syndrome, rheumatoid arthritis, 
ankylosing spondylitis) 

• Diseases that affect the aortic root (eg, hypertension, 
syphilis, inherited connective tissue disorders, aortic 
aneurysm) 


Table 32-5 Likelihood Ratio for Typical Murmur to Predict Aortic 
Regurgitation or an S 3 to Predict Severe Aortic Regurgitation 


Finding (Type 
of Clinician) 

Severity by 
Echocardiogram 
or Cardiac 
Catheterization 

LR+ (Range or 
Point Estimate 
With 95% Cl) 

LR- (Range or 
Point Estimate 
With 95% Cl) 

Typical murmur 7 ’ 8 

Mild or greater 

8.8-32 

0.2-0.3 

(cardiologist) 

Moderate or 
greater 

4.0-8.3 

0-0.1 

Murmur intensity 9 

Grade 3 

4.5 (1.6-14) 


(generalist or 
cardiologist) 3 

Grade 2 

1.1 (0.5-2.4) 


Grade 1 

0 (0 -0.9) 



No murmur 

0(0-1.1) 


Third heart sound 4 
(cardiologist) 

Severe 

5.9 (1.4-25) 

0.83 

(0.73-0.95) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like 
lihood ratio. 

a AII patients had aortic regurgitation, so the likelihood ratios here are for severe aortic 
regurgitation associated with the murmur intensity. 

REFERENCE STANDARD TESTS 

Echocardiography and angiography. 


PHYSICAL EXAMINATION SIGNS USEFUL IN 
THE DIAGNOSIS OF AORTIC REGURGITATION 

The presence of a typical murmur of AR (an early diastolic, 
decrescendo murmur) should prompt echocardiographic 
evaluation (Table 32-5). Many eponymic peripheral pulse 
findings are associated with AR, but they are not useful for 
screening or for distinguishing the severity of regurgitation. 


REFERENCES FOR THE UPDATE 
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1736-1754. 
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echocardiographic study. Am J Med. 2001;lll(2):96-102. a 



















CHAPTER 32 Murmur, Diastolic 


4. Cheitlin MD, Armstrong WF, Aurigemma GP, et al. ACC/AHA/ASE 
2003 guideline update for the clinical application of echocardiog¬ 
raphy: a report of the American College of Cardiology/American 
Heart Association Task Force on Practice Guidelines, http://www.acc. 
org/qualityandscience/clinical/guidelines/echo/index_clean.pdf. Accessed 
June 4, 2008. 

5. Lindroos M, Kupari M, Heikkila J, Tilvis R. Prevalence of aortic valve 
abnormalities in the elderly: an echocardiographic study of a random 
population sample. J Am Coll Cardiol. 1993;21 (5):1220-1225. 

6. Singh JP, Evans JC, Levy D, et al. Prevalence and clinical determinants of 
mitral, tricuspid, and aortic regurgitation (the Framingham study). Am J 
Cardiol 1999;83(6):897-902. 


7. Aronow WS, Kronzon I. Correlation of prevalence and severity of aortic 
regurgitation detected by pulsed Doppler echocardiography with a mur¬ 
mur of aortic regurgitation in elderly patients in a long-term health care 
facility. Am J Cardiol. 1989;63( 1): 128-129. 

8. Grayburn PA, Smith MD, Handshoe R, Friedman BJ, DeMarie AN. Detec¬ 
tion of aortic insufficiency by standard echocardiography, pulsed Doppler 
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TITLE Echocardiography in Evaluating Systolic Mur¬ 
murs of Unknown Cause. 

AUTHORS Attenhofer Jost CH, Turina J, Mayer K, et al. 

CITATION Am JMed. 2000;108(8):614-620. 

QUESTION How well can cardiologists identify patho¬ 
logic murmurs by auscultation and palpation alone? 

DESIGN Consecutive patients were prospectively identified 
at referral for evaluation of a systolic murmur of unknown 
cause. Each participant was independently examined by 2 
cardiologists from a pool of 8 and blinded to supplementary 
data and echocardiography results. Two-dimensional/ 
Doppler echocardiography was performed as the gold stan¬ 
dard in all participants. 

SETTING Cardiology division in Switzerland. 

PATIENTS One hundred patients referred for evaluation 
of systolic murmur of unknown cause were enrolled. Patients 
were excluded if they had a previously documented echocar- 
diographic examination. The mean age of the participants 
was 55 years, with SD 22, and 57% were women. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Full cardiac examination, with or without dynamic auscultation, 
was performed by 1 staff cardiologist and 1 cardiology associate. 
Only the results of the staff cardiologist’s examinations were used 
in the analysis, and no comparison with the associate’s findings is 
presented in the article. Murmurs were classified by Levine grade 
according to predefined characteristics, and the murmurs were 
classified as functional or organic according to the examiner’s 
clinical expertise. All patients underwent transthoracic 2-dimen- 
sional and Doppler echocardiography, and valvular stenosis and 
regurgitation were classified according to standard criteria. 

MAIN OUTCOME MEASURES 

Descriptive statistics, sensitivity, specificity. 


Table 32-6 Likelihood Ratio for Aortic Regurgitation According to the 
Presence of a Diastolic Murmur in Patients Referred for Systolic Murmurs 

AR by Echocardiography 

Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

AR by clinical 0.21 0.96 5.1 (1.4-19) 0.82(0.67-1.0) 

examination 

Abbreviations: AR, aortic regurgitation; Cl, confidence interval; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 

Main Results 

Twenty-eight of the patients referred for systolic murmurs had 
aortic regurgitation (AR). The degree of regurgitation was mild 
in 22 cases (79%) and associated with another echocardio- 
graphic lesion in 15 (54%) cases. The examiners made a clinical 
diagnosis of aortic insufficiency in 9 patients ( able 32-6). 

CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Prospective, consecutive patients. 

LIMITATIONS Small, referral population referred for evalua¬ 
tion of a murmur. The echocardiographers were not blinded 
to the clinical findings. 

The clinical examination was useful for ruling in AR but 
not for ruling out regurgitation. The negative likelihood ratio 
obtained in this study is higher than in a number of previous 
studies performed in a variety of settings. The difficult popu¬ 
lation in this study might explain this finding: patients were 
referred for evaluation of systolic murmurs, and those with 
AR had predominantly mild disease and approximately half 
had additional lesions. The ability of the cardiologist to iden¬ 
tify those with AR (likelihood ratio, 5.1) despite the referral 
indication of a systolic murmur is impressive. These data 
support the clinical suggestion that finding a diastolic mur¬ 
mur requires an echocardiogram to assess AR. 

Reviewed by David Cescon, MD, 
and Edward Etchells, MD 
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TITLE Pathophysiologic Determinants of Third Heart 
Sounds: A Prospective Clinical and Doppler Echocardio- 
graphic Study. 

AUTHORS Tribouilloy CM, Enriquez-Sarano M, Mohty 
D, et al. 

CITATION Am ]Med. 2001; 111 (2):96-102. 

QUESTION Does an audible S 3 predict severe hemody¬ 
namic alterations in cardiology patients? 

DESIGN Patients were identified at referral for echocar¬ 
diography. Clinical data were obtained by noting the 
results of a clinical examination performed by the refer¬ 
ring physician (a cardiologist or internist), who was 
unaware of the study. Transthoracic echocardiography 
was performed on all patients. 

SETTING Cardiology referral center in the United States. 

PATIENTS One hundred twenty-one patients with aor¬ 
tic regurgitation (AR) (mean age, 57 years; SD, 18 years; 
66% men; 15% New York Heart Association classes III- 
IV) were included in the study. These patients were pro¬ 
spectively enrolled from among patients referred by their 
personal physician for echocardiography for any indica¬ 
tion and found to have isolated mitral regurgitation or 
AR. Exclusion criteria included previous valve surgery, 
associated valvular stenosis, acute myocardial infarction, 
congenital or pericardial disease, or a change in cardiovas¬ 
cular status since clinical examination was performed. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Documentation of the presence or absence of an S 3 was 
abstracted from each patient’s chart as documented by the 
referring physician. Two-dimensional/Doppler echocardiog¬ 
raphy was performed on all patients by an echocardio- 
grapher. Severe regurgitation was defined as a regurgitant 
fraction of 40% or greater. 


MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratio for the ability of 
an S 3 to identify patients with severe regurgitation. 

MAIN RESULTS 

Fourteen patients with AR had an S 3 . Of the 121 patients 
with AR, 61 were classified as severe according to the 
echocardiogram ( >le 32- ). 


Table 32-7 Likelihood Ratio for the Presence of a Third Heart Sound to 
Predict Severe Aortic Regurgitation (AR) as Opposed to Less Severe AR 

Test Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

S 3 to identify 0.20 0.97 5.9(1.4-25) 0.83(0.73-0.95) 

severe AR 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Large sample size. 

LIMITATIONS The data were collected retrospectively, and it 
is not clear whether the echocardiographer was blinded to 
the clinical examination. The patients with an S 3 may have 
been selectively referred for echocardiograms, but this would 
have led to an inflated sensitivity and underestimated speci¬ 
ficity, which is the opposite of what the investigators found. 

The presence of an S 3 is highly specific for severe regurgita¬ 
tion in patients with isolated valvular regurgitation, and its 
presence reflects hemodynamically significant regurgitation 
reflected by left ventricular dysfunction. The detection of an 
S 3 should prompt further evaluation. However, the absence 
of an S 3 is not useful in ruling out significant AR. 

Reviewed by David Cescon, MD, 
and Edward Etchells, MD 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have an 

Abnormal Systolic 
Murmur? 

Edward Etchells, MD, MSC 
Chaim Bell, MD 
Kenneth Robb, MD 


CASE 1 You are asked to see a 64-year-old man who has 
been admitted to the orthopedic service after a packing 
crate tipped over on his leg, producing an unstable frac¬ 
ture of his distal tibia and fibula. You see him as he is 
being prepared for surgery. The patient previously had a 
normal exercise tolerance and no cardiac symptoms. You 
conduct a complete cardiac examination, observing a 
grade 2 systolic murmur, loudest at the lower left sternal 
border, which does not radiate to the right carotid artery. 
The S 2 has normal intensity, and you do not hear a fourth 
heart sound (S 4 ). The carotid artery pulsation has a nor¬ 
mal rate of increase and normal volume. The orthopedic 
surgeon is concerned about the murmur because a recent 
patient had a postoperative myocardial infarction (MI) 
and was subsequently diagnosed with aortic stenosis. The 
surgeon wonders whether surgery should be delayed until 
an echocardiogram is obtained to rule out aortic stenosis. 

CASE 2 Your next patient is a 34-year-old woman with¬ 
out cardiovascular symptoms who has normal exercise 
tolerance. She has a grade 2 systolic murmur that begins 
late in systole and is loudest at the lower left sternal bor¬ 
der. When the patient is examined in a standing position, 
the murmur increases in intensity, and you detect a loud 
systolic click just before the onset of the murmur. The rest 
of the cardiovascular examination result is normal. You 
suspect mitral valve prolapse (MVP), but you wonder 
how confident you should feel about the diagnosis. 


WHY IS THE CLINICAL EXAMINATION IMPORTANT 
IN EVALUATING SYSTOLIC MURMURS? 


Systolic murmurs can be an important clue to a structural 
cardiac abnormality (Table 33-1). The use of noninvasive 
cardiac testing, such as echocardiography, has increased dra¬ 
matically. It is estimated that 2% of the general population 
undergoes noninvasive cardiac diagnostic evaluation. 1 In lieu 
of performing routine echocardiography on patients with 
systolic murmurs, a careful clinical examination may elimi¬ 
nate the need for additional tests in selected patients. 


THE ANATOMIC AND PHYSIOLOGIC ORIGINS 
OF SYSTOLIC MURMURS 

Heart murmurs are produced when turbulent blood flow 
causes prolonged auditory vibrations of cardiac structures. 
The intensity of the murmur depends on many factors, 
including blood viscosity and blood flow velocity and turbu¬ 
lence. In addition, the distance between the vibrations and 
the stethoscope, the angle at which the vibrations meet the 
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Table 33-1 Selected Causes of Systolic Murmurs 

Abnormal cardiac structure 
Aortic stenosis 
Hypertrophic cardiomyopathy 
Mitral regurgitation 
Mitral valve prolapse 
Ventricular septal defect 
Pulmonic stenosis 
Tricuspid regurgitation 
Atrial septal defect 

Normal cardiac structure, increased flow 
Anemia 
Thyrotoxicosis 
Sepsis 

Renal failure with volume overload 


stethoscope, and the transmission qualities of the tissue 
between the vibration and the stethoscope affect murmur 
intensity. 2 

In this article, we will arbitrarily define an abnormal sys¬ 
tolic murmur as one associated with abnormal cardiac struc¬ 
ture. We will not consider the diagnosis of systolic murmurs 
caused by abnormally increased blood flow across normal 
cardiac structures, such as in anemia or thyrotoxicosis. How¬ 
ever, clinicians must consider the diagnosis of abnormally 
increased blood flow in patients with systolic murmurs. 

HOW TO EXAMINE FOR SYSTOLIC MURMURS 

Most clinicians agree that a complete clinical history and 
physical examination, including a detailed cardiac examina¬ 
tion, is an essential step in the assessment of systolic mur¬ 
murs. Clinicians will interpret a systolic murmur in an 
asymptomatic 24-year-old woman with iron deficiency ane¬ 
mia differently from a systolic murmur in a 76-year-old 
woman with fever, weight loss, and digital infarctions after 
recent dental surgery. 

Although a complete cardiac examination is important, 
the reliability and accuracy of many components of the car¬ 
diac examination for systolic murmurs have not been ade¬ 
quately evaluated. For example, the only adequately evaluated 
individual element of the cardiac history related to murmurs 
is effort syncope, which refers to a transient loss of con¬ 
sciousness during effort or exertion. This article focuses on 
features of the cardiac physical examination for systolic mur¬ 
murs that have been adequately evaluated for precision and 
accuracy. A complete description of the cardiac physical 
examination of systolic murmurs is beyond the scope of this 
article but can be found in many textbooks. 

The cardiac physical examination includes nonausculta¬ 
tory and auscultatory components. Adequately evaluated 
nonauscultatory components include carotid artery palpa¬ 
tion, apical-carotid delay, and brachioradial delay. To assess 
the carotid pulse, the clinician applies both light and firm 


pressure over the artery and assesses both the rate of increase 
and the pulse volume. Experts suggest that examiners pay 
special attention to the peak of pulsation. A normal rate of 
increase feels like a sharp tap, whereas an abnormal rate of 
increase feels like a nudge. An abnormal rate of increase can 
also feel like a weak tap, followed by a nudge or push. 3 Sur¬ 
prisingly, no clear guidelines exist for interpreting carotid 
volume. Suggested methods include palpating the artery with 
both hands and all fingers, or palpating with the thumb 
only. 4 We can only offer that a normal carotid volume is eas¬ 
ily felt with light palpation, whereas a reduced carotid vol¬ 
ume is difficult to feel even with firm palpation. 

Brachioradial delay and apical-carotid delay may be 
important findings for detecting aortic stenosis. For brachio¬ 
radial delay, the examiner palpates simultaneously the right 
brachial artery of the patient with the right thumb and the 
right radial artery of the patient with the left index and mid¬ 
dle finger. The examiner should use only light pressure on 
the brachial artery to avoid dampening the pulse waveform. 
The examiner attempts to detect a delay between the brachial 
artery and the radial artery pulsations; any palpable delay is 
considered abnormal. 5 For apical-carotid delay, the examiner 
simultaneously palpates the precordial apex pulsation and 
the right carotid artery. The examiner attempts to detect a 
delay between the apical and the carotid artery pulsation; any 
palpable delay is abnormal. 6 

In contrast to the cardiac history and nonauscultatory 
examination, many components of routine cardiac ausculta¬ 
tion have been adequately evaluated. During routine ausculta¬ 
tion, the examiner attempts to detect a systolic murmur, which 
can be defined as a systolic noise with a duration longer than a 
heart sound. 7 Examiners describe the grade, radiation (Table 
33-2), onset, duration, and timing of peak murmur intensity 
(Figure 33-1). The Levine grading system 8 facilitates descrip¬ 
tion of intensity: a grade 1 murmur is not heard immediately 
on auscultation but only after the examiner has focused on 
systole for a few seconds, a grade 2 murmur is heard immedi¬ 
ately on auscultation but is not loud, a grade 3 murmur is 
heard immediately on auscultation and is loud, and a palpable 
precordial vibration, called a thrill, signifies a grade 4 murmur. 
Other murmur characteristics, such as pitch and tonal quality, 
have not been adequately evaluated. 

Other evaluated relevant features on routine auscultation 
include the intensity of the S 2 , the S 4 , and systolic clicks. The 
intensity of S 2 can be graded as normal, decreased, or absent. 
A normal S 2 should be easily heard in the second right and 
left intercostal spaces next to the sternum and should be 
louder than the first heart sound (Sj) in these areas. 3 Abnor¬ 
mal splitting of the S 2 in relation to cardiac murmurs has not 
been adequately evaluated. 

An S 4 is a low-pitched sound occurring just before systole, 
sometimes described as a presystolic sound. The S 4 from the 
left ventricle is best heard with the bell of the stethoscope lightly 
applied to the patient in the left lateral decubitus position. 

Systolic clicks are high-pitched sounds with a duration 
similar to that of heart sounds. Systolic clicks (previously 
termed nonejection clicks) are associated with MVP. They 
generally occur later than 40 to 60 ms after the S 1; and 
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patient position greatly affects their timing. When a patient 
stands, a systolic click moves closer to the Sj. Ejection sounds 
(previously termed ejection clicks) come from aortic or pul¬ 
monary valves opening in early systole, approximately 40 to 
60 ms after the S|. The Sj and an ejection sound together 
have roughly the cadence of saying “pa-da” or “pa-ta” 
quickly. 3 Patient position causes no appreciable change in the 
timing of ejection sounds. 

After routine auscultation, the clinician may wish to fur¬ 
ther assess a systolic murmur using special maneuvers. If the 
maneuver is intended to increase the intensity of the mur¬ 
mur, then the clinician should listen at the edge of the mur¬ 
mur’s radiation, where the murmur is barely audible. This 
will make it easier to detect an increase in murmur intensity. 
Similarly, if the maneuver is intended to decrease the inten¬ 
sity of the murmur, then the clinician should listen at the 
point of maximal intensity. 

Maneuvers that primarily increase the venous return 

include quiet inspiration and sustained abdominal pressure. 

These maneuvers are intended to increase the intensity of 

right-sided heart murmurs, such as tricuspid regurgitation 
(TR) or pulmonic stenosis. For the quiet inspiration maneu¬ 
ver, the examiner determines the effect of quiet inspiration 
on the intensity of the murmur. 9 The examiner should not 
ask the patient to breathe deeply, because the murmur will be 
obscured by the breath sounds. For the sustained abdominal 
pressure maneuver, the examiner exerts firm, sustained pres¬ 
sure inward and cephalad below the right costal margin. The 
intensity of the murmur is observed during several cardiac 
cycles. 10 

Transient arterial occlusion primarily increases systemic 

arterial resistance. This maneuver increases the intensity of 

left-sided regurgitant murmurs, such as mitral regurgitation 
(MR) or ventricular septal defect. The examiner inflates 
simultaneously 2 sphygmomanometers placed around each 
of the patient’s upper arms to approximately 20 to 40 mm 
above the previously recorded systolic blood pressure of the 
patient. Twenty seconds after cuff inflation, any changes in 
murmur intensity are observed. 11 

Maneuvers that increase both venous return and sys¬ 

temic arterial resistance include standing to squatting and 
passive leg elevation. These maneuvers are intended to 

decrease the intensity of the murmur of hypertrophic car¬ 

diomyopathy and MVR For the standing to squatting 
maneuver, the clinician sits to the side of the patient and 
instructs him or her to rapidly squat from the standing 
position. Changes in murmur intensity are noted immedi¬ 
ately after squatting. 12 For the passive leg elevation maneu¬ 
ver, an assistant passively elevates both of the patient’s legs 
to approximately 45 degrees while the patient is supine. 
Changes in murmur intensity are observed 15 to 20 sec¬ 
onds after leg elevation. 13 

The Valsalva maneuver decreases venous return and 
increases systemic arterial resistance. 3 The Valsalva maneuver 
decreases the intensity of aortic stenosis murmurs. The 
patient strains against a closed glottis for 20 seconds, and 
changes in murmur intensity are observed just before the end 
of the 20-second period. 13 Patients may inadvertently do a 


Table 33-2 Typical Location of Maximal Intensity and Radiation for 
Various Types of Abnormal Systolic Murmurs 

Location of 

Maximal Intensity 

Radiation 

Typical for 

Second right intercostal 

Right carotid artery 

Aortic stenosis 

space 

Right clavicle 


Fifth or sixth left inter¬ 
costal space mid left 
thorax 

Left anterior axillary line 

Left axilla 

Mitral regurgitation 
(including mitral regur¬ 
gitation caused by 
mitral valve prolapse) 

Lower left sternal border 

Lower right sternal border Tricuspid regurgita- 


Epigastrium 

tion 


Fifth intercostal space, 
mid left thorax 


Fifth left intercostal 
space mid left thorax 

Lower left sternal border 

Hypertrophic cardio¬ 
myopathy 


Murmur onset and duration 

Holosystolic murmur 

Si S 2 

Late systolic murmur 

Si S 2 

Timing of peak murmur intensity 
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Figure 33-1 Select Features of Systolic Murmurs 

In the holosystolic murmur, the murmur begins just after the first heart sound 
(Si) and continues throughout the systole. In the late systolic murmur, the 
murmur begins at the middle of the systole or later and ends at the second 
heart sound (S 2 ). In an early peaking murmur, peak intensity is before the 
middle of the systole. In a mid- or late-peaking murmur, peak intensity is at 
the middle of the systole or later. 




Valsalva during other maneuvers, such as sustained abdomi¬ 
nal pressure or standing to squatting, so clinicians should 
ensure that patients breathe normally during these latter 
maneuvers. 


PRECISION OF THE EXAMINATION RELATED 
TO SYSTOLIC MURMURS 

Precision refers to agreement among clinicians regarding a 
particular clinical finding. The precision of the clinical exam¬ 
ination for systolic murmurs has been evaluated in usual 
clinical circumstances by auscultating patients 1417 or in con¬ 
trolled nonclinical circumstances by listening to prerecorded 
audiotapes. 18 Studies using audiotapes will yield higher esti¬ 
mates of precision, as will studies consisting of only normal 
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patients or very abnormal patients. Most of the available pre¬ 
cision studies include patients with various causes of abnor¬ 
mal systolic murmurs, although one study included only 
patients with mild or moderate aortic stenosis. 17 The experi¬ 
ence of observers likely affects precision; all but one study 16 
used cardiologists as the examiners. 

The only evaluated historical variable for diagnosing mur¬ 
murs is effort syncope, which had a K of 1.0 (simple agree¬ 
ment, 100) in one small study (n = 22). 17 This study excluded 
patients with other types of syncope that could be confused 
with effort syncope, so it was relatively easy for the cardiolo¬ 
gists to agree on the presence or absence of effort syncope. 

One study found that the agreement between cardiology 
trainees on the carotid upstroke was poor, but data to calcu¬ 
late simple agreement or K values were not provided. 19 The 
precision of physical findings is summarized in Table 33-3. 

The Bottom Line for Precision 

• The precision of examining for any systolic murmur is mod¬ 
erate using audiotapes (k, 0.48) but only fair in the clinical 
setting (k, 0.30). The precision of examining for a loud 
(grade 2 or louder) systolic murmur is good using audio- 
tapes (k, 0.74) but only fair in the clinical setting (k, 0.29). 

• The precision of examining for a late-peaking systolic 
murmur is excellent (k, 0.74). 

• The precision of examining for a systolic click is good 
(simple agreement, 85%). 

ACCURACY OF THE EXAMINATION RELATED 
TO SYSTOLIC MURMURS 

To develop a structured search strategy, we used pertinent 
articles already in our files. Our strategy was deliberately 
broad to minimize the possibility of overlooking important 
articles. We then searched MEDLINE (English language) 
from 1966 through January 1996, using our structured 
search strategy (available on request). We manually reviewed 


Table 33-3 Precision of the Clinical Examination of Systolic Murmurs 


Finding 

Examiner 

No. 

K a 

Simple 

Agreement, % 

No murmur vs 
grades 1-4 

Cardiologists 

(tapes) 18 

100 

0.48 

70 


Cardiologists 14 

100 

0.30 

54 


Cardiologists 16 

80 


86 


Cardiologists 16 

32 


97 


Noncardiologists 16 

32 


78 

No murmur/ 
grade 1 vs 

Cardiologists 

(tapes) 18 

100 

0.74 

87 

grades 2-4 

Cardiologists 14 

100 

0.29 

76 

Acoustic shape 
(late peaking vs 
not late peaking) 

Cardiologists 17 

22 

0.74 

95 

Midsystolic click 

Cardiologists 16 

80 


85 

“Ellipses indicate data not available. 


potentially relevant articles that we identified; we also 
reviewed the reference lists of these articles. We contacted 
authors of relevant studies for additional information. 

Studies were reviewed by 2 independent readers (E.E. and 
C.B.). Disagreements between reviewers were resolved by dis¬ 
cussion before a final quality grade was assigned. Quality grades 
were assigned using published guidelines (see Table 1-7 for a 
summary of Evidence Grades and Levels). 20 Grade A studies 
involve the independent, blind comparison of sign or symptom 
with a gold standard of diagnosis among a large number of con¬ 
secutive patients suspected of having the target condition. Grade 
B studies involve the independent, blind comparison of sign or 
symptom with a gold standard of diagnosis among a small 
number of consecutive patients suspected of having the target 
condition. Grade C studies involve the independent, blind com¬ 
parison of sign or symptom with a gold standard of diagnosis 
among nonconsecutive patients suspected of having the target 
condition; nonindependent comparison of sign or symptom 
with a gold standard of diagnosis among a sample of patients 
who obviously had the target condition plus, perhaps, normal 
individuals; or nonindependent comparison of a sign or symp¬ 
tom with a standard of uncertain validity. 

Many of the studies were conducted in cardiology clinics, so 
the prevalence of abnormalities in these studies will be higher 
than in usual practice. For example, a study of patients under¬ 
going cardiac catheterization for suspected aortic stenosis 
found a prevalence of aortic stenosis of 73%, so a positive clini¬ 
cal examination result virtually ruled in aortic stenosis. In usual 
practice, the prevalence of aortic stenosis would be much lower, 
so a positive clinical examination result would not rule in aortic 
stenosis, but rather indicate the need for further testing with 
echocardiography. 

IS THIS AN ABNORMAL MURMUR? 

Clinicians are primarily concerned whether a systolic mur¬ 
mur indicates a cardiac abnormality. In this context, the goal 
of the clinical examination is not an exact diagnosis, but 
rather identification of patients needing further testing to 
confirm or quantify an abnormality. 

Several studies evaluated the accuracy of the entire clinical 
examination, including the medical history, physical examina¬ 
tion, electrocardiogram, and chest radiograph; none has evalu¬ 
ated the history and physical examination alone. 21 ' 24 In each 
study, cardiologists used the clinical examination to classify a 
systolic murmur as normal, possibly abnormal, or abnormal. 
Patients then underwent an echocardiogram or cardiac cathe¬ 
terization as the reference standard test. The most common 
abnormalities detected were valvular stenosis or regurgitation, 
atrial or ventricular septal defects, MVP, and cardiac hypertro¬ 
phy. The study results, which are summarized in Table 33- , 
indicate that cardiologists are efficient at identifying abnormal 
and normal murmurs. 

The Bottom Line for Abnormal Murmur 

• A clinical assessment of “normal murmur” by a cardiologist 

significantly reduces the likelihood of a cardiac abnormality. 
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• A clinical assessment of “abnormal murmur” by a cardiologist 

significantly increases the likelihood of a cardiac abnormality. 

Aortic Stenosis 

Effort syncope is the only adequately studied individual histori¬ 
cal variable. Presence of effort syncope in patients with a sys¬ 
tolic murmur effectively rules in aortic stenosis (positive 
likelihood ratio [LR+], °°; 95% confidence interval [Cl], 1.3-°°) 
but absence of effort syncope is not helpful (negative likelihood 
ratio [LR-], 0.76; 95% Cl, 0.67-0.86) (grade C study). 17 

Several studies have examined the accuracy of the physical 
examination for detecting aortic stenosis. In these studies, 
echocardiography or cardiac catheterization confirmed aor¬ 
tic stenosis. Definitions of aortic stenosis varied, with peak 
instantaneous gradients ranging from as low as 25 mm Hg to 
as high as 50 mm Hg or aortic valve areas ranging from as 
low as 0.7 cm 2 to as high as 1.1 cm 2 . 

Many physical findings may increase or decrease the likeli¬ 
hood of aortic stenosis. 17,25,26 Table 33-5 lists the Endings 
beginning with the highest positive LRs from the largest 
studies with the best methodologic quality. All of the studies 
used cardiologist examiners. 

Two studies are notable for their high methodologic qual¬ 
ity and large sample sizes. The first study 25 involved 781 con¬ 
secutive, unreferred elderly patients who were nursing home 
residents. Each study participant received an examination by 
a single senior cardiologist, followed by an echocardiogram. 
Overall, 68 patients (9%) had aortic stenosis defined as a 
peak instantaneous Doppler gradient of 25 mm Hg or 
greater. This study provides a reasonable estimate of the 
accuracy of the clinical examination in an elderly population. 
Many of the patients had no symptoms and no audible mur¬ 
mur, which may have elevated the estimates of specificity and 
the positive LRs for some of the Endings. 

The second study 26 evaluated 231 consecutive patients 
referred for cardiac catheterization for various reasons, includ¬ 
ing suspected aortic stenosis. Cardiology fellows or cardiolo¬ 
gists examined patients before cardiac catheterization. Overall, 
113 patients (49%) had aortic stenosis, defined as a valve area 
of 0.8 cm 2 or less or a peak gradient of 50 mm Hg or greater, at 
cardiac catheterization. This study population was highly 
selected, so the prevalence of aortic stenosis was much higher 
than would be expected in usual clinical practice. 

The accuracy of special maneuvers was evaluated by 2 trained 
cardiologists, who examined 50 nonconsecutive participants 
with a variety of heart diseases, including aortic stenosis, MR, 
ventricular septal defect, hypertrophic cardiomyopathy, pul¬ 
monic stenosis, and TR. 27 No maneuver was useful for ruling in 
aortic stenosis (data not shown), but certain findings from the 
Valsalva maneuver reduced the likelihood of aortic stenosis 
(Table 33-5). 

A potentially useful multivariate decisional aid for diag¬ 
nosing aortic stenosis was developed using split-sample vali¬ 
dation (Table 33-6). 26 The study showed an excellent positive 
LR for patients with point scores higher than 10. One of the 
variables in this model was aortic valve calcification on the 
lateral chest radiograph. 


Table 33-4 Accuracy of Clinical Examination for Detecting Abnormal 
Systolic Murmur 


Quality 

Overall Clinical Assessment LR (95% Cl) Grade 3 


Abnormal Murmur 

Study 1 21b 

oo ("1 4-°o) 

A 

Study 2 23c 

°° (2.8-°°) 

C 

Study 3 22d 

3.8 (2.8-5.4) 

C 

Possibly Abnormal Murmur 

Study I 215 

2.3 (0.7-5.9) 

A 

Study 2 24e 

1.3 (1.2-1.4) 

C 

Normal Murmur 

Study 1 21b 

0 (0-0.4) 

A 

Study 2 22c 

0.01 (0-0.02) 

C 

Study 3 24e 

0.05(0.01-0.20) 

C 

Study 4 23c 

0.3 (0.1-0.6) 

C 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 
b 0f 103 patients, 93 had normal murmurs. The study was conducted among preg¬ 
nant patients. Reference standard: echocardiogram. 
c 0f 30 patients, 16 had normal murmurs. Reference standard: echocardiogram. 
d 0f 1059 patients, 100 had normal murmurs. Reference standard: cardiac 
catheterization. 

“Of 532 patients, 378 had normal murmurs. Reference standard: echocardiogram. 

Although the preceding results are encouraging, 2 small 
studies had less impressive results. The first study included 
75 patients with severe multivalvular disease who were 
undergoing cardiac catheterization and found that a cardi¬ 
ologist’s clinical diagnosis of aortic stenosis was only rea¬ 
sonably accurate (LR+, 3.7; 95% Cl, 2.2-7.0; LR-, 0.23; 
95% Cl, 0.11-0.44). 28 Many of these patients had severe 
multivalvular disease, which may have made an exact diag¬ 
nosis more difficult. A study on 35 elderly patients with sys¬ 
tolic murmurs who were examined by a geriatrician found 
that a clinical diagnosis of neither “aortic stenosis present” 
(LR+, 2.4; 95% Cl, 0.72-6.9) nor “aortic stenosis absent” 
(LR, 0.7; 95% Cl, 0.30-1.1) was accurate. 29 This study sug¬ 
gests that assessments by cardiologists may be better than 
assessments by noncardiologists. 

The Bottom Line for Aortic Stenosis 

• The presence of any of the following clinical findings signifi¬ 
cantly increases the likelihood of aortic stenosis: effort syn¬ 
cope, slow rate of increase of the carotid pulse, timing of peak 
murmur intensity in late or midsystole, decreased intensity 
or absent S 2 , apical-carotid delay, or brachioradial delay. 

• The absence of any of the following clinical findings signifi¬ 
cantly reduces the likelihood of aortic stenosis: any systolic 
murmur or murmur radiation to the right carotid artery. 

• Combinations of the following clinical variables can be 
useful to rule in or rule out aortic stenosis: decreased 
carotid volume, delayed carotid upstroke, decreased or 
absent S 2 , murmur loudest at second right intercostal 
space, and valve calcification on chest radiograph. 



















CHAPTER 33 The Rational Clinical Examination 


Table 33-5 Accuracy of the Physical Examination for Detecting Aortic Stenosis 



Finding 

Reference Standard 
(No. of Patients) 

LR+ (95% Cl) 

LR- (95% Cl) 

Quality Grade 2 

Slow rate of increase of carotid pulse 

Study I 25 

Cardiac catheterization (781) 

130(33-560) 

0.62 (0.51-0.75) 

A 

Study 2 26 

Cardiac catheterization (231) 

2.8 (2.1-3.7) 

0.18(0.11-0.30) 

C b 

Study 3 17 

Cardiac catheterization (106) 

6.4 (0.8-45) 

0.73 (0.59-0.90) 

C 

Timing of peak murmur intensity 

Late peaking 25 

Cardiac catheterization (781) 

101 (25-410) 

0.31 (0.22-0.44) 

A 

Midpeaking 17 

Cardiac catheterization (106) 

8.0 (2.7-23) 

0.13(0.07-0.24) 

C 

Decreased intensity or absent second heart sound 

Study I 25 

Cardiac catheterization (781) 

50 (24-100) 

0.45 (0.34-0.58) 

A 

Study 2 26 

Cardiac catheterization (231) 

3.1 (2.1-4.3) 

0.36 (0.26-0.49) 

C b 

Apical carotid delay 6 

Cardiac catheterization (44) 

co (2.4-°°) 

0.05 (0.01-0.31) 

C 

Brachioradial delay 5 

Echocardiogram (58) 

6.8(3.2-14) 

0.0 (0.0-0.3) 

C 

Fourth heart sound 25 

Cardiac catheterization (781) 

2.5 (2.1-3.0) 

0.26 (0.14-0.49) 

A 

Presence of any murmur 25 

Cardiac catheterization (781) 

2.4 (2.2-27) 

0(0-0.13) 

A 

Reduced carotid volume 

Study I 26 

Cardiac catheterization (231) 

2.3(17-3.0) 

0.31 (0.21-0.46) 

C b 

Study 2 17 

Cardiac catheterization (106) 

2.2(1.2-4.2) 

0.39 (0.22-0.69) 

C 

Radiation to right carotid 

Study I 25 

Cardiac catheterization (781) 

1.4 (1.3-1.5) 

0.10(0.13-0.40) 

A 

Study 2 26 

Cardiac catheterization (231) 

1.5(1.3-17) 

0.05 (0.01-0.20) 

C b 

With Valsalva maneuver 
intensity is decreased 27 

Cardiac catheterization (50) 

1.2 (0.8-1.6) 

0 (0-1.6) 

C 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“See Table 1 7 for a summary of Evidence Grades and Levels. 

“Grade A study except cardiac catheterization interpreted with knowledge of clinical findings. 


Table 33-6 Multivariable Decision Rule for Suspected Aortic Stenosis 26 


Aortic Stenosis 3 


Point Score 

Yes 

No 

LR (95% Cl) b 

14 

7 

0 

°° (0.6-°°) 

10-13 

22 

1 

8.0 (1.6-46) 

7-9 

22 

3 

2.7 (1.0-8.0) 

2-6 

11 

15 

0.27(0.15-0.49) 

0 

1 

4 

0.10(0.01-0.58) 

Total 

63 

23 


Variable 


Point Score 

Reduced carotid volume 


2 


Slow rate of increase of carotid pulse 


3 


Murmur loudest at second right inter¬ 
costal space 


2 


Decreased or absent second heart sound 


3 


Valve calcification on chest radiograph 


4 


Maximum score 


14 



Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Defined as peak transvalvular gradient of > 50 mm Hg at cardiac catheterization. 
“Ellipsis indicates not applicable. 


Mitral Regurgitation 

We report the accuracy of the clinical examination for detect¬ 
ing moderate to severe regurgitation confirmed through 
echocardiography or cardiac catheterization (Table 33-7). 
Detection of moderate to severe MR, even in asymptomatic 
patients, may influence recommendations for echocardio- 
graphic monitoring 30 or medical treatment. 31 

If a cardiologist hears a murmur in the mitral area (mid 
left thorax, fifth intercostal space), then the likelihood of 
MR is increased slightly, but absence of a murmur signifi¬ 
cantly reduces the likelihood of MR. 15,32,33 Similarly, a late 
systolic or holosystolic murmur slightly increases the likeli¬ 
hood of MR, but absence of such a murmur significantly 
reduces the likelihood of MR. In the setting of acute MI, 
absence of a murmur is less useful for ruling out acute MR 
(LR-, 0.66; 95% Cl, 0.25-1.0). 34 Transient arterial occlusion 
was accurate for ruling in and ruling out left-sided regur¬ 
gitant murmurs, such as MR and ventricular septal defect. 27 

Internal medicine house staff are less accurate than cardi¬ 
ologists for detecting the murmur of MR, with positive LRs 
ranging from 1.1 (for interns) to 4.6 (for medical students) 
and negative LRs ranging from 0.7 (for junior residents) to 
1.0 (for interns and senior residents) 35 (grade A study). 
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Table 33-7 Accuracy of the Clinical Examination for Detecting Mitral Regurgitation 




Finding 

Reference Standard (No. of Patients) 

LR+ (95% Cl) 

LR- (95% Cl) 

Quality 

Grade 3 

Murmur in mitral Study I 33 

Echocardiogram: moderate to severe MR (394) 

3.9 (3.0-5.1) 

0.34 (0.23-0.47) 

C 

area study 2 32 

Cardiac catheterization: moderate to severe MR (35) 

3.6(1.9-7.7) 

0.12(0.02-0.50) 

C 

Late or holosystolic murmur 16 

Echocardiogram: moderate to severe MR (80) 

1.8(1.2-2.5) 

0 (0-0.8) 

C 

Any murmur during acute Ml 34 

Cardiac catheterization: moderate to severe MR (206) 

4.7 (1.3-11) 

0.66(0.25-1.0) 

C 

With transient arterial occlusion, 
murmur increases in intensity 27 

Cardiac catheterization: severity not stated 6 

7.5 (2.5-23) 

0.28 (0.13-0.60) 

C 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; Ml, myocardial infarction; MR, mitral regurgitation. 
“See Table 1 -7 for a summary of Evidence Grades and Levels. 

b Outcome of interest was left-sided regurgitant lesions, including MR or ventricular septal defect. 


The Bottom Line for Mitral Regurgitation 

• For cardiologists, absence of a mitral area murmur or a late 
systolic/holosystolic murmur significantly reduces the like¬ 
lihood of MR, except in the setting of acute MI. 

• Cardiologists can accurately distinguish left-sided regur¬ 
gitant murmurs, such as MR and ventricular septal defect, 
using transient arterial occlusion. 

• Noncardiologists’ assessments for MR are considerably less 
accurate. 

Tricuspid Regurgitation 

Cardiologists are reasonably accurate for diagnosing the mur¬ 
mur of moderately severe to severe TR in patients (n = 21, with 
TR; n = 295, without TR) referred for echocardiography (LR+, 
10.1; 95% Cl, 5.8-18; LR-, 0.41; 95% Cl, 0.24-0.70) (grade C ). 33 
Special maneuvers may also be helpful for diagnosing TR and 
other right-sided lesions such as pulmonic stenosis. One study 
(n = 10, with TR or pulmonic stenosis; n = 40, without TR or 
pulmonic stenosis) using cardiologist examiners found that an 
increase in murmur intensity with inspiration significantly 
increased the likelihood of a right-sided valvular lesion, whereas 
the absence of increased intensity made these conditions less 
likely (LR+, 8.0; 95% Cl, 3.5-18; LR-, 0.0; 95% Cl, 0-0.43) 
(grade C ). 27 In another study, patients with severe MR (n = 15) 
or TR (n = 15) were examined by experienced cardiologists 
before cardiac catheterization . 10 To distinguish TR from MR, 
increased murmur intensity on inspiration had a positive LR of 
°° (95% Cl, 3.1-0°) and a negative LR of 0.20 (95% Cl, 0.07- 
0.45). For the finding of increased murmur intensity with sus¬ 
tained abdominal pressure, the positive LR was °° ( 950/0 Cl, 
2.5-oo) and the negative LR was 0.33 (95% Cl, 0.15-0.58) 
(grade C). 

The Bottom Line for Tricuspid Regurgitation 

• Cardiologists can accurately detect the murmur of TR. 

• Cardiologists can accurately rule in and rule out TR with 
the quiet inspiration and sustained abdominal pressure 
maneuvers. 

Hypertrophic Cardiomyopathy 

There are limited data on the accuracy of clinical examination 
for hypertrophic cardiomyopathy (also termed idiopathic 


hypertrophic subaortic stenosis). Many studies evaluate phono¬ 
cardiography or intracardiac tracings rather than ausculta¬ 
tion , 36 ' 40 whereas others include fewer than 15 patients . 41 ' 45 One 
study evaluated carotid sinus pressure, which is not routinely 
recommended for the clinical examination . 46 

Special maneuvers may help distinguish the murmur of 
hypertrophic cardiomyopathy . 27 Using cardiologist examiners, 
if a murmur decreased in intensity with passive leg elevation, 
then hypertrophic cardiomyopathy was significantly more 
likely (LR+, 8.0; 95% Cl, 3.0-21), whereas if the murmur did 
not decrease in intensity, the likelihood was significantly 
reduced (LR-, 0.22; 95% Cl, 0.06-0.77). If murmur intensity 
was decreased or unchanged with standing to squatting, then 
hypertrophic cardiomyopathy was significantly more likely 
(LR+, 4.5; 95% Cl, 2.3-8. 6 ), whereas if the murmur increased 
in intensity, the likelihood of hypertrophic cardiomyopathy was 
significantly reduced (LR-, 0.13; 95% Cl, 0.02-0.81) (grade C). 

The Bottom Line for Hypertrophic Cardiomyopathy 

Cardiologists can rule in or rule out hypertrophic cardiomy¬ 
opathy by evaluating for decreased murmur intensity with 
passive leg elevation or increased murmur intensity when the 
patient goes from a squatting to standing position. 

Mitral Valve Prolapse 

The accuracy of the clinical examination for diagnosing MVP 
cannot be defined, because clinical findings alone are sufficient 
for the diagnosis of MVP. A patient with a systolic click and a 
systolic murmur meets the diagnostic criteria for MVP even if 
the patient has a normal echocardiogram result . 47,48 

However, we can examine the relationship between clinical 
findings and echocardiographic findings (Table 33 -8). 49 53 With 
cardiologist examiners, a systolic click accompanied by a sys¬ 
tolic murmur helped to rule in echocardiographic MVP. The 
accuracy of an isolated systolic click is variable, possibly 
because of unreliability of the clinical examination and differ¬ 
ences between studies regarding the definition of echocardio¬ 
graphic MVP. An isolated systolic murmur has little effect on 
the likelihood of echocardiographic MVP, whereas absence of 
both a systolic click and a murmur appears to reduce the likeli¬ 
hood of echocardiographic MVP. Noncardiologists are less 
accurate than cardiologists for all of these findings. 
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Table 33-8 Accuracy of the Clinical Examination for Detecting 

Echocardiographic Mitral Valve Prolapse 



Finding Clinician (No. of Patients) 

LR (95% Cl) 

Quality Grade 3 

Systolic click and murmur 

Study I 49 Cardiologists (401) 

19(4.6-80) 

C 

Study 2 50 Noncardiologists (104) 

2.4 (1.0-5.7) 

C 

Systolic click 

Study I 49 Cardiologists (401) 

12(5.4-25) 

C 

Study 2 50 Noncardiologists (104) 

1.3 (0.7-2.2) 

C 


Nonejection click, with or without a murmur 


Study 1 51 Cardiologists (155) 

3.8 (2.3-6.8) 

A 


Study 2 52 Cardiologists (140) 

1.7 (1.3-2.1) 

C 


Murmur, with or without a systolic click 

Study 1 52 Cardiologists (140) 

1.9 (1.3-3.0) 

C 


Study 2 53 Noncardiologists (259) 

1.2 (0.9-1.5) 

C 


Murmur only 

Study I 50 Cardiologists (401) 

2.4 (1.0-5.7) 

C 


Study 2 51 Noncardiologists (104) 

0.7 (0.3-1.3) 

C 


No murmur, no systolic click 

Study 1 51 Cardiologists (155) 

0.04 (0.02-0.11) 

A 


Study 2 52 Cardiologists (140) 

0.26 (0.12-0.54) 

C 


Study 3 49 Cardiologists (401) 

0.21 (0.15-0.29) 

C 


Study 4 50 Noncardiologists (104) 

0.53(0.23-1.20) 

C 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 


Table 33-9 Accuracy of the Clinical Examination for Predicting 

Adverse Clinical Outcomes Related to Mitral Valve Prolapse 3 

Finding 

Clinician (No. of 
Patients) 

LR (95% Cl) 

Quality 

Grade 8 

Holosystolic murmur 

Study I 56 

Cardiologists (316) 

18(6.6-51) 

C 

Study 2 57 

Cardiologists (321) 

5.1 (2.2-9.9) 

C 

Late systolic murmur or click and murmur 

Study I 58 

Cardiologists (316) 

1.2(0.7-17) 

C 

Study 2 57 

Cardiologists (321) 

0.8 (0.3-1.5) 

C 

Click and holosytolic 
murmur 57 

Cardiologists (321) 

0.8 (0.2-2.4) 

C 

Any click or isolated click 

Study I 58 

Cardiologists (316) 

0.4 (0.2-0.8) 

C 

Study 2 57 

Cardiologists (321) 

0.26(0.05-1.1) 

C 

No click/no murmur 

Study I 54 

Cardiologists (237) 

0 (0-4.1) 

C 

Study 2 58 

Cardiologists (316) 

0 (0-1.4) 

C 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Outcomes include death (cardiac or all-cause, depending on study), stroke, endo¬ 
carditis, or progressive mitral regurgitation requiring surgery. Most outcomes were 
progressive mitral regurgitation requiring surgery. 
b See Table 1 -7 for a summary of Evidence Grades and Levels. 


Mitral valve leaflet redundancy or thickening is the echocar- 
diographic variable most strongly associated with adverse 
clinical outcomes. 54,55 In one study, neither a systolic click 
(LR+, 2.8; 95% Cl, 1.8-4.6; LR-, 0.76; 95% Cl, 0.69-0.84) nor 
a systolic murmur (LR+, 1.3; 95% Cl, 1.1-1.5; LR-, 0.57; 
95% Cl, 0.43-0.76) affected the likelihood of echocardiographic 
mitral valve leaflet thickening or redundancy (grade C study). 56 

Several longitudinal studies of patients with echocardio¬ 
graphic MVP have related baseline clinical findings to the devel¬ 
opment of adverse clinical events, including cardiac death, 
progressive MR requiring surgery, endocarditis, and systemic 
embolism. 57,58 A holosystolic murmur without a systolic click 
significantly increased the likelihood of an adverse event, 
whereas absence of both a systolic click and murmur was associ¬ 
ated with no adverse events. Other clinical findings had little 
effect on the likelihood of adverse events (Table 33-9). 

The Bottom Line for Mitral Valve Prolapse 

• A systolic click, with or without systolic murmur, is suffi¬ 
cient for the diagnosis of MVP. 

• If a cardiologist hears a systolic click, with or without a mur¬ 
mur, then the likelihood of echocardiographic MVP is sig¬ 
nificantly increased. The absence of both a systolic click and 
murmur significantly reduces the likelihood of echocardio¬ 
graphic MVP. 

• In patients with echocardiographic MVP, a holosystolic 
murmur without a systolic click significantly increases the 
likelihood of long-term complications, whereas absence of 
both a systolic click and murmur significantly reduces the 
likelihood of long-term complications. 

WHEN TO EXAMINE FOR SYSTOLIC MURMURS 

We are unaware of data by which one might give an evidence- 
based recommendation regarding the examination for sys¬ 
tolic murmurs. Auscultation for systolic murmurs should 
probably be carried out in any patient for whom a complete 
cardiovascular database is necessary. 

ARE SYSTOLIC MURMURS EVER NORMAL? 

In unreferred young adults, the prevalence of systolic mur¬ 
murs ranges from 5% to 52% 8,59 ' 61 ; echocardiography result is 
normal in 86% to 100%. 62 ' 64 Echocardiography result is nor¬ 
mal in 90% to 94% of pregnant women with systolic mur¬ 
murs who are referred for testing. 21,24,65 In elderly medical 
outpatients or residents of long-term care facilities, the prev¬ 
alence of systolic murmurs ranges from 29% to 60% 66 ' 68 ; 
echocardiography is normal in 44% to 100%. 24,25,29,69,70 This 
wide range of normal in the elderly can be partially explained 
by various study definitions of normal echocardiograms. 
Commonly detected abnormalities in the elderly were left 
ventricular systolic dysfunction, aortic stenosis, and MR. 
Other studies include aortic valve sclerosis as an abnormality, 
although the clinical importance of aortic valve sclerosis is 
uncertain. 
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A venous hum 71 and a mammary souffle are both normal 
conditions that present either as systolic murmurs or, more 
commonly, as continuous murmurs. 

HOW TO IMPROVE SKILLS IN EXAMINING THIS AREA 

The characteristics of murmurs can be learned using cardio¬ 
vascular auscultatory tapes or cardiac patient simulators, 
although the effectiveness of these aids is uncertain. 72,73 Most 
audiotapes are accompanied by phonocardiographic and 
expert cardiologist analyses, so these tapes can help clinicians 
to calibrate their ears to those of experts. 

Most commercially available stethoscopes have similar 
acoustic properties, although some have poor performance 
at low frequencies. 74 Good stethoscope maintenance is essen¬ 
tial because dirt or cracked tubing 75 will significantly reduce 
accuracy. Large earpieces are better because small earpieces 
can be occluded by the sharp bony angle at the external audi¬ 
tory meatus. 3 

At the bedside, eliminate background noise whenever pos¬ 
sible. If background noise is unavoidable, try to repeat your 
examination in a quieter setting. 

Finally, relate your clinical findings to the results of assess¬ 
ments by a colleague, a cardiologist, or an echocardiogram 
whenever possible. Resolving disagreements between your 
assessments and those of others is an excellent way of 
upgrading your clinical skills. 

RECOMMENDATIONS FOR FURTHER RESEARCH 

Most studies used cardiologists or senior cardiology fellows to 
conduct the clinical examinations. There are few data on the 
precision and accuracy of the clinical examination conducted 
by noncardiologists. Some studies include inappropriately nar¬ 
row spectrums of patients, such as only patients with moderate 
and severe aortic stenosis. 5,6,17 Further studies should focus on a 
broad spectrum of patients from primary or secondary care 
settings, particularly patients older than 40 years when the 
prevalence of abnormal murmurs is significantly increased. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 Your first patient, who is awaiting urgent surgery 
for an open fracture, had a systolic murmur that did not 
radiate to the right carotid artery. The likelihood of aortic 
stenosis is significantly reduced by this finding. In addition, 
the carotid artery pulsation had normal volume, the S 2 
intensity was normal, and there was no S 4 . These findings 
also help to reduce the likelihood of aortic stenosis. You are 
confident in your assessment because it was conducted in a 
quiet room with a comfortable and cooperative patient. 
You can advise the surgeon that aortic stenosis is unlikely. 

CASE 2 Your second patient has a systolic click and a sys¬ 
tolic murmur, strongly suggesting MVR If you are an experi¬ 
enced auscultator, then these findings significantly increase 


the likelihood that the echocardiogram will show evi¬ 
dence of MVR However, even if the echocardiogram 
result is normal, you already have enough evidence to 
diagnose MVP. You may wish to obtain an echocardio¬ 
gram at a later date to determine the severity of the valvu¬ 
lar abnormality. 
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CLINICAL SCENARIO 


A 62-year-old man scheduled for elective total knee 
replacement has been referred to you for preoperative 
assessment of a systolic murmur. The orthopedic sur¬ 
geon detected a systolic murmur and wants to rule out 
aortic stenosis (AS) before surgery. The patient has no 
cardiovascular symptoms. On auscultation, you hear 
normal first and second heart sounds (Sj and S 2 ). There 
is a grade 3 early systolic murmur, loudest at the lower 
left sternal border, which does not radiate to either the 
right clavicle or carotids. You detect a normal volume 
and normal rate of increase of the carotid pulse. The rest 
of the clinical examination results, including those for 
the electrocardiogram (ECG) and chest radiograph, are 
normal. 

Original Review 

Etchells EE, Bell C, Robb K. Does this patient have an abnor¬ 
mal systolic murmur? JAMA. 1997;277(7):564-571. 

UPDATED LITERATURE SEARCH 

Our literature search combined the parent search strategy 
for The Rational Clinical Examination with the following 
terms: “systolic and murmur,” “heart valve diseases,” “aor¬ 
tic valve stenosis,” “pulmonary valve stenosis,” “mitral valve 
prolapse,” “mitral valve insufficiency,” “tricuspid valve 
insufficiency,” “hypertrophic cardiomyopathy,” and “heart 
murmurs.” Results were limited to English-language publi¬ 
cations in the MEDLINE database from 1996 to July 2004. 
The titles and abstracts of the search results were screened, 
case reports were excluded, and 28 potentially relevant pri¬ 
mary studies and review articles were retrieved. We 
scanned the reference list of each article for additional 
studies. For accuracy studies, we retained those of adult 
subjects that included sensitivity and specificity data of 
physical findings and had a quality score of level 3 or 
greater. We excluded level 3 studies with fewer than 100 
patients. Five new studies were ultimately included in this 
update. 


Prepared by David Cescon, MD, and Edward Etchells, MD, MSC 

Reviewed by Eugene Oddone, MD 


NEW FINDINGS 

1. Cardiologists are able to distinguish normal (“innocent”) 
murmurs from abnormal murmurs by the physical exami¬ 
nation alone. 

2. Emergency department physicians are able to detect normal 
murmurs by clinical evaluation (including physical exami¬ 
nation; medical history; ECG, chest radiograph, and labo¬ 
ratory test results; and previously recorded chart data). 

3. The presence of a holosystolic murmur, loud murmur, 
decreased carotid upstroke, or systolic thrill makes it 
much more likely that a systolic murmur represents an 
underlying cardiac abnormality rather than a functional 
murmur. 

4. In patients for whom examiners did not know whether a 
murmur was present before examination, emergency 
department physicians and cardiologists identified valvu¬ 
lar heart disease with good accuracy 

5. Absence of murmur radiation to the right clavicle makes 
moderate to severe AS much less likely. 

6 . The presence of any 3 of the following findings makes 
moderate to severe AS much more likely: maximal mur¬ 
mur intensity in second right intercostal space, reduced 
carotid pulse volume, slow rate of increase of carotid 
pulse, and reduced or absent second heart sounds (S 2 ). 

7. When mitral regurgitation (MR) is identified, murmur 
intensity equal to or more than grade 3 makes severe 
regurgitation more likely. 


IMPROVEMENTS IN DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The newer studies do not alter the results reported in the 
original publication but do provide new information on the 
role of individual auscultatory findings. 

In the original article, the need to identify patients at 
higher risk for endocarditis because of valvular abnormalities 
was suggested as a rationale for performing the clinical exam¬ 
ination. The recommendations for endocarditis prophylaxis 
have changed. Patients with murmurs from structural abnor¬ 
malities of a native valve do not automatically require antibi¬ 
otic prophylaxis to prevent infective endocarditis. 1 
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CHANGES IN THE REFERENCE STANDARD 

The reference standard is an echocardiogram or a cardiac 
catheterization that assesses valvular competency. 

RESULTS OF THE LITERATURE REVIEW 

Precision 

Since the original review, 2 published studies involving non¬ 
cardiologist examiners have evaluated the precision of vari¬ 
ous physical examination maneuvers in actual patients. 2,3 In a 
large study of medical patients presenting to the emergency 
department, there was substantial agreement on the presence 
of systolic murmurs (k = 0.8). The precision of examining 
for a loud murmur (k = 0.59) and for an S 2 in the clinical set¬ 
ting is moderate (k = 0.54), whereas the precision of other 
findings is only fair. In both of these studies, the various End¬ 
ings were not evaluated independently, so the examiners’ 
opinions may have been influenced by the presence or 
absence of related findings. 

Accuracy 

Distinguishing Abnormal From 
Normal (Innocent) Murmurs 

Two new studies evaluated examiners’ ability to distinguish 
murmurs caused by an underlying cardiac abnormality from 
those generated by structurally normal hearts (innocent 
murmurs). One of these studies evaluated the accuracy of the 
entire clinical evaluation (including physical examination, 
medical history, echocardiogram, chest radiograph, labora¬ 
tory tests, and data from old charts) by noncardiologist 
emergency department physicians, and one evaluated the 
accuracy of the cardiologist’s physical examination alone. 4 

In a study of high methodologic quality, Reichlin et al 2 
evaluated the performance of emergency department physi¬ 
cians’ clinical assessments of patients with systolic mur¬ 
murs. Although these noncardiologists are somewhat less 
accurate at distinguishing normal from innocent murmurs 


Table 33-10 Ability of Findings to Identify Patients With Significant 
Cardiac Lesions vs Functional Systolic Murmur 


LR for a Significant Systolic Murmur 3 

Clinical Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Holosystolic murmur (n = 26) 

8.7 (2.3-33) 

0.19(0.08-0.43) 

Loud murmur (n = 29) 

6.5(2.3-19) 

0.08 (0.02-0.31) 

Plateau-shaped murmur (n = 20) 

4.1 (1.4-12) 

0.48 (0.30-0.77) 

Loudest at the apex (n = 30) 

2.5(0.58-11) 

0.84(0.65-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“The LR+ is the likelihood ratio when the finding is present and indicates an increased 
likelihood that the systolic murmur is associated with moderate to severe aortic steno¬ 
sis or mitral regurgitation, congenital shunt, or intraventricular pressure gradients. The 
LR- is the likelihood ratio when the finding is absent and shows the likelihood that a 
significant lesion will be present when the finding is absent. 


than cardiologists, a normal clinical assessment signifi¬ 
cantly reduces the likelihood of a cardiac abnormality (neg¬ 
ative likelihood ratio [LR-], 0.29; 95% confidence interval 
[Cl], 0.17-0.45). 

The second study assessed the ability of cardiologists to 
distinguish innocent from pathologic murmurs by physical 
examination alone in patients referred for evaluation of a 
systolic murmur. The cardiologists’ overall assessments of 
significant heart disease (defined as moderate to severe val¬ 
vular heart disease, congenital shunt, or an intraventricular 
gradient identified by echocardiography) performed with a 
positive likelihood ratio (LR+) of 11 (95% Cl, 5.0-26) and 
LR- of 0.22 (95% Cl, 0.10-0.41). In addition, several clinical 
signs were assessed to appraise their performance in catego¬ 
rizing significant systolic murmurs confirmed by echocardi¬ 
ography. The most frequently detected findings, and those 
that were most useful, are shown in ole 33-10. 

Patients with mild AS or regurgitation are not included in 
the calculation of these LRs. Patients with a loud, plateau¬ 
shaped, or holosystolic murmur are more likely to have sig¬ 
nificant lesions than functional murmurs or mild valvular 
heart disease. Similarly, the absence of holosystolic or loud 
murmur suggests that there are no significant lesions. How¬ 
ever, an echocardiogram must be obtained when the clini¬ 
cian wants to determine whether the murmur represents 
moderate to severe AS or regurgitation, a congenital shunt, 
or an intraventricular pressure gradient. 

Identifying Valvular Heart Disease by Physical Examination 

The ability to distinguish innocent from pathologic mur¬ 
murs is important in stratifying patients for referral for echo¬ 
cardiography. However, the ability to make this distinction 
does not reflect examiners’ true ability to determine the pres¬ 
ence of valvular heart disease: by excluding patients with no 
audible murmur, the specificity of the physical examination 
for valvular disease is underestimated. 

In the study by Reichlin et al, 2 the inclusion criteria required 
that at least 2 of 3 screening physicians agree that a subject had a 
murmur: 203 patients were enrolled from 852 screened, whereas 
582 were excluded because no systolic murmur was heard. 
There was excellent agreement among examiners about the 
presence of a murmur, with disagreement in only 18 patients 
(2%). The exclusion of those patients with no murmur is an 
example of verification bias. Verification bias occurs when the 
gold standard test is not applied to all the potentially eligible 
patients to confirm their disease status. In this case, patients 
without systolic murmurs were excluded from the analysis and 
had no echocardiogram to confirm the absence of structural 
heart disease. Typically, selective inclusion creates an overesti¬ 
mate of the sensitivity and an underestimate of the specificity of 
the clinical assessment. However, because the study provides 
complete information on all patients, we are able to correct for 
verification bias, with the assumption that patients with no 
murmur truly had no valvular disease. Recalculation yields an 
LR+ of 14 (95% Cl, 10-19) for a clinical assessment suggesting 
an abnormal murmur and a LR- of 0.21 (95% Cl, 0.13-0.34) 
when either no murmur was heard or the murmur was deemed 
normal. Because some patients without systolic murmurs can 
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still have AS or MR, these corrected LRs represent the best possi¬ 
ble clinical performance. 

Another study using cardiologist examiners addressed the 
performance of a complete cardiovascular physical examina¬ 
tion without additional information in a population of 
asymptomatic individuals. The patients were not selected 
because of an auscultated abnormality. 5 A murmur was heard 
in 63 patients, with 17 murmurs classified as abnormal; 
transesophageal echocardiography identified valvular abnor¬ 
malities in 33 patients. In this population, the cardiovascular 
physical examination alone performed with an LR+ of 38 
(95% Cl, 9.5-154) and LR-of 0.31 (95% Cl, 0.18-0.52). 

Thus, these 2 studies provide information on the clinician’s 
ability to identify valvular heart disease irrespective of the pres¬ 
ence of a murmur, which better reflects an initial assessment in 
clinical practice. Although the populations of patients studied 
are different and the emergency department assessment 
includes supplementary information, the examiners’ overall 
performance in these studies is similar. When an abnormal 
murmur is identified, the pooled LR+ for echocardiographic 
valvular disease is 15 (95% Cl, 11-20; results homogenous with 
P - .11; I 1 = 48%; 95% Cl, 0%-86%); when no murmur is heard 
or the murmur is determined to be “normal,” the pooled LR- is 
0.25 (95% Cl, 0.17-0.36; results homogenous with P = .29; 1 2 = 
16%; 95% Cl, 0%-55%). 6 

Aortic Stenosis 

One new grade 2 study (n = 123), 3 performed by noncardiolo¬ 
gists, prospectively evaluated individual findings and combina¬ 
tions of findings for the detection of moderate or severe AS 
(defined as an aortic valve area less than 1.2 cm 2 or peak transval- 
vular gradient of 25 mm Hg or more). A slow carotid upstroke 
was the most important individual finding for ruling in AS (LR+, 
9.2; 95% Cl, 3.4-24) ( ). The 2-step process for using 

combinations of findings begins with examination for the pres¬ 
ence of a murmur over the right clavicle. If this murmur is 
absent, AS is considerably less likely (LR-, 0.1; 95% Cl, 0.02- 
0.44). When a murmur radiates to the right clavicle, 4 associated 
findings are sought: highest intensity of murmur at second right 
intercostal space, reduced intensity of S 2 , reduced carotid vol¬ 
ume, and slow carotid upstroke. When zero to 2 of these associ¬ 
ated findings are present, the result is indeterminate (LR, 1.8; 
95% Cl, 0.93-2.9), whereas if 3 to 4 of these findings are present, 
the likelihood of AS is significantly increased (LR, 40; 95% Cl, 
6.6-239). 

Mitral Regurgitation 

One study evaluated the accuracy of isolated findings in pre¬ 
dicting severe MR, 7 defined as a regurgitant fraction of 40% 
or more detected by echocardiography (Table 33-12). The 
clinical findings of interest were abstracted from the patients’ 
personal charts, as recorded by the patients’ own physicians 
(cardiologists and internists), who were unaware of the 
study. The study evaluated the relationship between the 
intensity of the murmur and the severity of regurgitation, 
and demonstrated a significant correlation. 

Mitral Valve Prolapse 

No new high-quality studies added to the information in the 
original review. The absence of a murmur and click rules out 


Table 33-11 Accuracy of the Physical Examination for Detecting 

Aortic Stenosis 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Slow carotid upstroke 

9.2 (3.4-24) 

0.56 (0.32-0.8) 

Murmur radiating to right carotid 

8.1 (4-16) 

0.29(0.12-0.57) 

Reduced or absent S 2 

7.5(3.2-17) 

0.50 (0.27-0.76) 

Murmur over right clavicle 

3.0 (2-4.1) 

0.10(0.02-0.44) 

Any systolic murmur 

2.6(1.8-3.5) 

0 (0-0.45) 

Reduced carotid volume 

2.0 (1-3.2) 

0.64 (0.34-0.99) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 33-12 Accuracy of the Physical Examination for Detecting 
Severe Mitral Regurgitation 7 

Finding 

LR+ (95% Cl) 

Murmur grades 4-5 

14(3.3-56) 

Murmur grade 3 

3.5 (2.1-5.7) 

Murmur grades 0-2 

0.19(0.11-0.33) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 

mitral valve prolapse (MVP) (LR, 0.04). The presence of a 
nonejection click (a high-pitched sound of short duration in 
mid or late systole) with or without a murmur slightly 
increases the likelihood of echocardiographic MVP (LR 3.8). 8 

EVIDENCE FROM GUIDELINES 

The American College of Cardiology/American Heart Associa¬ 
tion guidelines (2003) 9 recommend echocardiography to evalu¬ 
ate heart murmurs in patients with cardiovascular symptoms or 
in asymptomatic patients with clinical features that suggest a 
moderate or greater probability that the murmur is reflective of 
underlying structural heart disease. Echocardiography is not 
recommended in asymptomatic adults whose murmur has been 
identified as functional or innocent by an experienced observer. 9 


CLINICAL SCENARIO—RESOLUTION 


Your patient’s murmur did not radiate to the right clavicle. 
This finding makes AS much less likely (LR, 0.1). There are 
no other concerning features that raise the possibility of other 
serious structural heart disease, including the ECG and chest 
radiograph. If you are an experienced clinician, this reduces 
the likelihood of important structural heart disease (LR, 
0-0.1). If you are less experienced and not certain of your 
overall assessment that the murmur is “functional,” concen¬ 
trating on whether the murmur is holosystolic or “loud” and 
whether the patient has a decreased carotid upstroke or sys¬ 
tolic thrill may yield more useful information than your clin¬ 
ical gestalt. Conditions that can cause increased blood flow 
through a structurally normal heart should be excluded, such 
as anemia, renal failure, and thyrotoxicosis. 
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SYSTOLIC MURMURS—MAKE THE DIAGNOSIS 


Systolic murmurs are common, and echocardiography is 
normal in the majority of asymptomatic individuals with 
murmurs. Clinical evaluation offers the potential to identify 
those patients with increased likelihood of underlying struc¬ 
tural disease and to avoid costly echocardiographic evalua¬ 
tion in all patients with systolic murmurs. 

PRIOR PROBABILITY 

One study of randomly selected elderly Finnish persons (aged 
75-86 years) found a prevalence of moderate to severe AS of 
8.8% in women and 3.6% in men. 10 The prevalence in younger 
patients ought to be less. The Framingham Heart Study showed 
that echocardiographic evidence of MR is common and a func¬ 
tion of both age and sex. 11 A useful approximation for the prev¬ 
alence of mild to moderate MR is 15% from age 40 to 60 years 
for both men and women. After age 60, women have a preva¬ 
lence of about 25% compared with men, who have an increas¬ 
ing frequency of MR that approximates 40% by age 80 years. 
The prevalence of MVP is about 2.5%. 12 ' 13 

POPULATION FOR WHOM A SYSTOLIC MURMUR 
SHOULD BE ASSESSED 

• It is sensible to listen for a systolic murmur in every patient 
for whom a complete cardiac database is necessary. 

• Once a patient with a systolic murmur is identified, the clini¬ 
cal examination helps identify those more likely to have sig¬ 
nificant underlying cardiac lesions. However, a cardiac 
echocardiogram is required to determine whether the find¬ 
ing represents a significant or less significant cardiac lesion. 

• The presence of a murmur can be heard with a variety of 
underlying lesions such as myocardial ischemia, endocar¬ 
ditis, and disturbances that cause a high flow rate. 

IDENTIFYING NORMAL (INNOCENT) MURMURS 

Cardiologists and emergency physicians are accurate at dis¬ 
tinguishing abnormal from innocent murmurs (Tables 33-13 
and 33-14). 


Table 33-13 Likelihood Ratio for the Overall Examination for Detecting 
Valvular Disease 


LR for Valvular Disease 


LR+ (95% Cl) 

LR- (95% Cl) 

Cardiologists 5 

38(9.5-154) 

0.31 (0.18-0.52) 

Emergency department 
physicians 2 

14(10-19) 

0.21 (0.13-0.34) 

Summary 

15(11-20) 

0.25(0.17-0.36) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 


Because the overall performance of generalist physicians has 
not been described, attention to individual findings may be 
even more useful than the overall clinical impression when a 
murmur is auscultated. 


Table 33-14 Likelihood Ratios of Individual Findings for Identifying 
Murmurs That Are Significant 


LR for a Significant Systolic Murmur 8 


Clinical Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Systolic thrill (n = 8) 

12(0.76-205) 

0.73 (0.58-0.93) 

Holosystolic murmur (n = 26) 

8.7 (2.3-33) 

0.19(0.08-0.43) 

Loud murmur (n = 29) 

6.5 (2.3-19) 

0.08(0.02-0.31) 

Plateau-shaped murmur (n = 20) 

4.1 (1.4-12) 

0.48 (0.30-0.77) 

Loudest at the apex (n = 30) 

2.5 (0.58-11) 

0.84(0.65-1.1) 

Radiation to the carotid (n = 9) 

0.91 (0.28-3.0) 

1.0(0.78-1.3) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 

“Moderate to severe aortic stenosis or mitral regurgitation, congenital shunt, or intra¬ 
ventricular pressure gradient. 


AORTIC STENOSIS 

The presence of AS requires detection of a systolic murmur, 
generally radiating to the right clavicle. For such patients, 
evaluate the S 2 to determine whether it is reduced in inten¬ 
sity, feel the carotid artery to assess whether the volume is 
reduced and the upstroke slower than normal, and assess 
whether the murmur is loudest in the second right intercos¬ 
tal space (Table 33-15). 


Table 33-15 Likelihood Ratios of Combinations of Findings 
for Aortic Stenosis 


LR (95% Cl) for Moderate 
Clinical Findings 8 or Greater Aortic Stenosis 


Systolic murmur over right clavicle 
+ 3-4 associated findings 

40 (6.6-239) 

Systolic murmur over right clavicle 
+ 0-2 associated findings 

1.8 (0.93-2.9) 

No systolic murmur over right clavicle 

0.1 (0.02-0.44) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Reduced or absent second heart sound, reduced carotid volume, slow rate of 
increase of carotid pulse, and maximal murmur intensity in second right intercostal 
space. 

MITRAL REGURGITATION AND 
MITRAL VALVE PROLAPSE 

Although cardiologists are accurate at identifying echocardio¬ 
graphic MR (Table 33 -16), the performance of generalist phy¬ 
sicians has not been evaluated as well. Once MR is identified, 
the intensity of the murmur helps to identify the severity of 
the regurgitation. 6 
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Table 33-16 Likelihood Ratio for the Murmur Intensity to Identify 
Severe Mitral Regurgitation 

Finding 

LR+ (95% Cl) 

Murmur grades 4-5 

14(3.3-56) 

Murmur grade 3 

3.5 (2.1-5.7) 

Murmur grade 0-2 

0.19(0.11-0.33) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 


The absence of a murmur and click rules out MVP (LR, 0.04), 
whereas the presence of a systolic click, with or without a 
murmur, slightly increases the likelihood of echocardio- 
graphic MVP (LR,3.8). 

REFERENCE STANDARD TEST 

Echocardiography or cardiac angiography. 
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TITLE A Bedside Clinical Prediction Rule for Detecting 
Moderate or Severe Aortic Stenosis. 

AUTHORS Etchells E, GlennsV, Shadowitz S, Bell C, Siu S. 

CITATION } Gen Intern Med. 1998;13(10):699-704. 

QUESTION Can a clinical prediction rule using simple 
physical examination findings accurately detect aortic ste¬ 
nosis (AS) in a broad spectrum of patients? 

DESIGN Consecutive patients were prospectively enrolled 
when they were referred for echocardiography. Two exam¬ 
iners (a third-year medical resident and a staff general inter¬ 
nist) performed the maneuvers on all enrolled patients. An 
echocardiographer, blinded to the findings, identified all 
patients with moderate or greater AS. 

SETTING General medical/cardiology wards in an 
urban university hospital in Toronto. 

PATIENTS One hundred twenty-three patients admit¬ 
ted to the general medicine and cardiology wards. The 
majority had some history of congestive heart failure, 
angina, or myocardial infarction. The median age was 68 
years, 58% were men, and 56% had Canadian Cardiovas¬ 
cular Society class I symptoms at the study. Exclusion cri¬ 
teria were age younger than 50 years, cardiac care unit/ 
intensive care unit admission, unstable angina within 48 
hours, history of cardiovascular surgery or valve replace¬ 
ment, severe dyspnea at rest, or inability to consent. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Two examiners, blinded to echocardiographic findings, inde¬ 
pendently performed a structured physical examination and 
focused medical history on all enrolled patients. Transtho¬ 
racic echocardiography was performed on all patients by an 
echocardiographer blinded to the clinical findings, who iden¬ 
tified moderate to severe AS, defined as aortic valve area of 
1.2 cm 2 or smaller or peak transvalvular gradient of 25 mm 
Hg or higher. 


MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios; K for interob¬ 
server variability. 

MAIN RESULTS 

Seventeen patients (14%) were found to have AS, with com¬ 
plete physical examination data available for 15. 

CONCLUSION 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective data collection with valid refer¬ 
ence standard and confirmed independence of clinical exam¬ 
ination. 

LIMITATIONS This study included only 17 patients with the 
condition of interest. 


Table 33-17 Likelihood Ratios for Findings to Predict Aortic Stenosis 

Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Slow carotid 
upstroke 
(n = 12) 

0.47 

0.95 

9.2 (3.4-24) 

0.56 (0.32-0.8) 

Murmur radiat¬ 
ing to right 
carotid (n = 20) 

0.73 

0.91 

8.1 (4-16) 

0.29(0.12-0.57) 

Reduced S 2 
(n = 15) 

0.53 

0.93 

7.5(3.2-17) 

0.50 (0.27-0.76) 

Murmur over 
right clavicle 
(n = 45) 

0.93 

0.69 

3.0 (2-4.1) 

0.10(0.02-0.44) 

Any systolic 
murmur (n = 52) 

1.0 

0.64 

2.6 (1.8-3.5) 

0 (0-0.45) 

Reduced carotid 
volume (n = 35) 

0.53 

0.73 

2.0 (1.0-3.2) 

0.64 (0.34-0.99) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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Table 33-18 Combination of Findings for Predicting Aortic Stenosis 

LR (95% Cl) 

Murmur over clavicle + 3-4 associated find¬ 
ings 8 (n = 7) 

40 (6.6-239) 

Murmur over clavicle + 0-2 associated find¬ 
ings (n = 38) 

1.8(0.93-2.9) 

No murmur over right clavicle (n = 69) 

0.1 (0.02-0.44) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Associated findings include reduced second heart sound (S 2 ), reduced carotid vol¬ 
ume, slow carotid upstroke, and murmur loudest at second right intercostal space. 


This study validates several physical examination maneuvers 
as performed by generalist physicians in a broad spectrum of 
older general medical inpatients ( ble 33-17). These patients 
are typical of those admitted into hospitals or referred for car¬ 
diovascular evaluation. The use of moderate to severe AS as the 
finding of interest is a clinically significant endpoint. The study 
confirms that the absence of any murmur or the absence of a 
murmur over the right clavicle is the best finding for ruling out 
AS. A reduced carotid upstroke by palpation, a murmur radiat¬ 
ing to the right carotid, or S 2 that is reduced in intensity 
increases the likelihood the most. In contrast to previous stud¬ 
ies, a murmur radiating to the right carotid is useful for identi¬ 
fying patients with AS if detected, but AS can still exist without 
the presence of a murmur radiating to the carotid. 

The examiners participating in the study underwent a brief 
training period (30 minutes) and performed a standardized 
physical examination. As a result, the performance of the 
examination might be lower among examiners without the 
training, although the brief training period could be easily rep¬ 
licated. In addition, because the findings are assessed as part of 
a standardized physical examination, it is impossible to evalu¬ 
ate their independence. In other words, an examiner who 
observes that one of the findings is present might be more 
influenced and likely to describe other abnormal findings. 

The authors also created and prospectively evaluated com¬ 
binations of findings ( e 33-18), which performed with 
excellent accuracy: a lack of a murmur radiating to the right 
clavicle effectively rules out AS of moderate or greater sever¬ 
ity, whereas the presence of such a murmur in association 
with 3 or more other findings rules in the diagnosis. 


Table 33-19 Reliability of Findings tor Aortic Stenosis 

Finding 

Generalized k (Lower 95% Cl) 

S 2 (normal vs decreased) 

0.54 (0.46) 

Loud murmur (>II/VI) second RICS 

0.45 (0.37) 

Radiation to right clavicle 

0.36 (0.28) 

Radiation to right carotid 

0.33 (0.25) 

Delayed carotid upstroke 

0.26(0.18) 

Reduced carotid volume 

0.24(0.16) 

Presence of any systolic murmur 

0.19(0.11) 


Abbreviations: Cl, confidence interval; RICS, right intercostal space. 


The reliability assessment of individual maneuvers is use¬ 
ful and demonstrates that individual findings have reliabili¬ 
ties that vary from slight to moderate ( ble 33-19). 

Reviewed by David Cescon, MD 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Full cardiac examination with or without dynamic ausculta¬ 
tion as deemed appropriate by 2 blinded cardiologist examin¬ 
ers. Murmurs were classified by Levine grade and described 
and characterized as functional or organic according to the 
examiners’ clinical expertise. All patients underwent transtho¬ 
racic 2D/Doppler echocardiography; valvular stenosis and 
regurgitation were classified according to standard criteria. 

MAIN OUTCOME MEASURES 

Raw data, sensitivity, specificity. 

MAIN RESULTS 

Twenty-one patients had a “functional” murmur and were 
considered normal. Of the 79 patients with “organic” mur¬ 
murs, 29 patients had aortic stenosis (AS) of various severity 
and 30 patients had mitral regurgitation (MR). Although the 
patients were referred for evaluation of systolic murmurs, 


TITLE Echocardiography in Evaluating Systolic Mur¬ 
murs of Unknown Cause. 

AUTHORS Attenhofer Jost CH, Turina J, Mayer K, et al. 

CITATION Am }Med. 2000;108(8):614-620. 

QUESTION How well can cardiologists identify patho¬ 
logic murmurs by auscultation or palpation alone? 

DESIGN Consecutive patients were prospectively identi¬ 
fied at referral for evaluation of a systolic murmur of 
unknown cause. Each subject was independently exam¬ 
ined by 2 cardiologists from a pool of 8, blinded to sup¬ 
plementary data and echocardiography results. Two- 
dimensional (2D)/Doppler echocardiography was per¬ 
formed as the gold standard in all participants. It is not 
clear whether the ultrasonographers were blinded to the 
clinical examination. 

SETTING Cardiology division in Switzerland. 

PATIENTS One hundred patients referred for evalua¬ 
tion of systolic murmur of unknown cause were enrolled. 
Patients were excluded if they had a previously docu¬ 
mented echocardiographic examination. The mean age of 
subjects was 55 ± 22 years, and 57% were women. 
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Table 33-20 Likelihood Ratios for Overall Assessment of a Valvular 
Lesion of Any Severity 


LR+ (95% Cl) 

LR- (95% Cl) 

Aortic stenosis (n = 33) 

2.1 (1.1-3.9) 

0.78(0.61-0.95) 

Mitral regurgitation (n = 33) 

2.3(1.5-3.6) 

0.43(0.23-0.71) 

Aortic regurgitation (n = 9) 

5.1 (1.5-3.9) 

0.82 (0.63-0.95) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio. 


echocardiography revealed aortic regurgitation in 28.The data 
in 33-2' indicate the likelihood of the finding when the 
cardiologists’ overall assessment results were positive. 

The cardiologists’ overall clinical assessments of significant 
heart disease (defined as moderate to severe valvular heart dis¬ 
ease, congenital shunt, or an intraventricular gradient) per¬ 
formed with a positive likelihood ratio (LR+) of 11 (95% 
confidence interval [Cl], 5.0-26) and negative likelihood ratio 
(LR-) of 0.22 (95% Cl, 0.10-0.41). The characteristics of the 
murmur and response to a few maneuvers were assessed to 
identify their performance in categorizing significant systolic 
murmurs confirmed by echocardiography (Table 33-2 ). 

A loud (diagnostic odds ratio, 81) or holosystolic murmur 
(diagnostic odds ratio, 46) was the most accurate finding for 
identifying those patients with a significant murmur vs those 
with a functional murmur. 

No patient had a diminished carotid upstroke, so this find¬ 
ing cannot be assessed from the data. A diminished second 
heart sound (S 2 ) was assessed, but the finding was heard in 5 
patients only. One maneuver, the response to Valsalva, was 
assessed. Typically, patients with AS or MR would have a 
decreased intensity with the initiation of the maneuver, 
whereas patients with hypertrophic cardiomyopathy would 
have an increase. The maneuver in this study did not help 
identify patients with significant lesions (LR+, 1.2; 95% Cl, 
0.66-2.2; and LR-, 0.84; 95% Cl, 0.50-1.4), but no patients 
with hypertrophic cardiomyopathy were found. 


Table 33-21 Likelihood Ratio of Signs for a Significant Systolic 
Murmur 


LR for a Significant Systolic Murmur 
vs a Functional Murmur 


Clinical Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Systolic thrill (n = 8) 

12(0.76-205) 

0.73 (0.58-0.93) 

Holosystolic murmur (n = 26) 

8.7 (2.3-33) 

0.19(0.08-0.43) 

Loud murmur (n = 29) 

6.5(2.3-19) 

0.08 (0.02-0.31) 

Plateau-shaped murmur (n = 20) 

4.1 (1.4-12) 

0.48 (0.30-0.77) 

Loudest at the apex (n = 30) 

2.5 (0.58-11) 

0.84 (0.65-1.1) 

Radiation to the carotid (n = 9) 

0.91 (0.28-3.0) 

1.0(0.78-1.3) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive likelihood 
ratio; LR-, negative likelihood ratio. 


CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Prospective, consecutive patients. 

LIMITATIONS Small referral population referred for evalua¬ 
tion of a murmur. The echocardiographers were not blinded 
to the clinical findings. The CIs around some of these find¬ 
ings are large. 

For the individual clinical signs, we could calculate the LR 
comparing patients with a significant murmur vs those with 
a functional murmur. This analysis ignores the patients who 
had less significant cardiac lesions as the explanation for their 
systolic murmur (eg, mild AS or MR). Thus, clinicians must 
understand that although these findings might identify 
patients more likely to have a significant vs a functional mur¬ 
mur, an echocardiogram must be done to determine whether 
the findings are associated with a significant or less-signifi¬ 
cant cardiac lesion. 

The results suggest that a cardiologist’s examination is use¬ 
ful even when the referring clinician is uncertain that a mur¬ 
mur is innocent. Because these patients are likely the most 
difficult to examine, the results for the cardiologist might be 
a “worst-case” scenario for the LRs. We can anticipate that 
for all patients with systolic murmurs, the LRs would suggest 
greater accuracy. 

The presence of a variety of findings increases the likeli¬ 
hood that a systolic murmur will be significant. Loud, pla¬ 
teau-shaped, holosystolic murmurs with a thrill will have a 
high likelihood of emanating from significant cardiac abnor¬ 
malities. These individual findings might work better than 
the clinician’s overall clinical assessment for assessing systolic 
murmurs for patients in whom the diagnosis might not be 
readily apparent from the physical examination findings. An 
important caveat is that this analysis suggests only the pres¬ 
ence of a significant lesion as defined by the authors as 
opposed to a functional murmur. Thus, the presence of find¬ 
ings with a high LR+ means that the clinician must request 
an echocardiogram to determine whether the underlying 
cardiac lesions are significant or less significant. Similarly, the 
absence of a loud or holosystolic murmur makes a significant 
lesion less likely, but an echocardiogram would be required 
to identify patients with less significant lesions. 

The results of this study should be interpreted in light of 
the clinical population—patients referred for evaluation of 
systolic murmurs that likely included those for whom the 
referring clinician was uncertain of the diagnosis. The data in 
the table do not represent the LRs for a specific diagnosis (eg, 
AS), but for any significant lesion associated with a systolic 
murmur. 

The response to Valsalva does not help identify significant 
AS or mitral regurgitant murmurs, but this maneuver could 
still be important for identifying significant hypertrophic 
cardiomyopathy. 

Reviewed by David Cescon, MD, and Edward Etchells, 
MD, MSC 


E33-3 






















CHAPTER 33 Evidence To Support The Update 


TITLE Intensity of Murmurs Correlates With Severity of 
Valvular Regurgitation. 

AUTHORS Desjardins VA, Enriquez-Sarano M, Tajik AJ, 
Bailey KR, Seward JB. 

CITATION Am ]Med. 1996;100(2):149-156. 

QUESTION Does the intensity of regurgitant murmurs 
on clinical examination correlate with the degree of 
echocardiographic regurgitation? 

DESIGN Investigators prospectively enrolled 210 con¬ 
secutive patients undergoing Doppler echocardiography 
who were found to have chronic isolated mitral or aortic 
regurgitation. Results of a physical examination per¬ 
formed within 2 weeks of echocardiography by the 
patient’s own physician (179 cardiologists, 31 general 
internists), who was unaware of the study, were abstracted 
from chart data. 

SETTING Echocardiography laboratory in a major US 
center. 

PATIENTS Two hundred ten consecutive patients pro¬ 
spectively identified with chronic, isolated mitral regurgi¬ 
tation (MR) or aortic insufficiency (AI) of mild or greater 
severity. Exclusion criteria included previous valve repair 
or replacement, associated valvular stenosis or acute 
regurgitation, and lack of physical examination per¬ 
formed by the referring physician within 2 weeks of echo¬ 
cardiography. For the 40 patients with isolated AI, the 
mean age was 58 ± 16 years, 65% were men, 8% were in 
atrial fibrillation, and the mean regurgitant fraction was 
36% ± 16%. For the 170 patients with MR, the mean age 
was 64 ± 13 years, 54% were men, 21% were in atrial 
fibrillation, and the mean regurgitant fraction was 36% ± 
18% by Doppler echocardiography. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Quantitative Doppler and 2-dimensional echocardiography 
were performed on all patients before enrollment. It is not 
clear whether the echocardiographers were blinded to clini¬ 
cal data. Severe regurgitation was defined as a regurgitant 
fraction of 40% or higher. The clinical examination docu¬ 
menting murmur severity was performed independently by 
each patient’s personal physician, who was not aware of the 
study and did not receive any special training or instruction 
regarding standardization of murmur grading. 


MAIN RESULTS 

The intensity of the murmur predicts the severity of MR 

( Table 33-22). 


Table 33-22 Likelihood Ratios for the Presence of Severe Mitral 
Regurgitation as a Function of the Murmur Intensity 

Murmur Grade 

LR (95% Cl) 

4 Or 5 

14(3.3-56) 

3 

3.5 (2.1-5.7) 

0-2 

0.19(0.11-0.33) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


CONCLUSION 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The population included in this study repre¬ 
sents a difficult sample because all had some degree of regur¬ 
gitation. The study examines a relevant clinical question 
because the ability to correlate the intensity of a regurgitant 
murmur with the degree of regurgitation is a useful clinical 
tool. 

LIMITATIONS Only patients with isolated lesions were 
included. The results demonstrate that the evaluation of 
murmur intensity of isolated MR by internists and cardiolo¬ 
gists is a useful diagnostic test: a loud murmur (grade 4 or 
greater) is a good predictor of severe MR, whereas a murmur 
of grade 2 or less effectively rules out the presence of severe 
MR. 

This study simulated normal clinical conditions without 
special training or standardized instructions to the examiner. 
These results are valid only in chronic, isolated MR and can¬ 
not be applied to the acute setting or to patients with com¬ 
plex murmurs. 
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MAIN OUTCOME MEASURES 

Raw data, correlation coefficients (r). Likelihood ratios were 
calculated from the data provided. 
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TITLE Initial Clinical Evaluation of Cardiac Systolic 
Murmurs in the Emergency Department by Noncardiolo¬ 
gists. 

AUTHORS Reichlin S, Dieterle T, Camli C, Leimenstoll B, 
Schoenenberger RA, Martina B. 

CITATION Am JEmergMed. 2004;22(2):71-75. 

QUESTION How well do noncardiologists distinguish 
innocent systolic murmurs from those produced by val¬ 
vular heart disease in a typical emergency department 
evaluation? 

DESIGN Medical patients presenting to the emergency 
department were prospectively identified and evaluated 
for the presence of a systolic murmur. If 2 of 3 physicians, 
including 1 study physician, agreed on the presence of a 
murmur, the patient was enrolled in the study. 

SETTING Emergency department of a university teach¬ 
ing hospital in Switzerland. 

PATIENTS Two hundred three patients were enrolled 
from 852 medical patients screened in the emergency 
department. The patients were typical medical patients, 
with mean age of 64.7 (± 22.3) years, and 58% were 
women. A significant percentage of the enrolled patients, 
had chest pain at presentation, and the majority had a 
pathologic electrocardiogram (ECG) (61%) or chest 
radiograph (53%) in the emergency department. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The emergency department attending physician’s clinical 
evaluation (including medical history, physical examination, 
ECG, chest radiograph, and laboratory tests) sought to dis¬ 
tinguish normal from abnormal murmurs in all enrolled 
patients. Transthoracic echocardiography was performed to 
identify valvular heart disease in all enrolled subjects within 
24 hours by 2 cardiologists blinded to the results of the clini¬ 
cal evaluation. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios (LRs). 


MAIN RESULTS 

Seventy-one of 203 patients had structural heart disease evi¬ 
dent on echocardiography. Twenty-one patients were 
excluded because there was no informed consent (17) or the 
echocardiography was not performed (4), leaving 582 
patients with no systolic murmur. Of the entire sample size, 
there was disagreement for only 18 patients, for whom a 
third examiner settled the discordance. 

The K statistic for the presence of a murmur was 0.8; the K 
statistic for murmur grades 0 to 2 vs those greater than grade 
2 was 0.59. 

CONCLUSION 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS Prospective, consecutive patients with inde¬ 
pendent application of the reference standard in a popula¬ 
tion typical for those in whom distinguishing a normal from 
an abnormal systolic murmur by clinical examination is an 
important clinical question. Because the patients provided 
information on all potentially eligible patients, we can cor¬ 
rect for verification bias. 

LIMITATIONS Entrance criteria required that 2 of 3 examin¬ 
ers agree that a murmur was present. Although this may 
decrease generalizability, it improves our confidence that a 
murmur was present. 

This large, high-quality study evaluated the utility of the 
clinical evaluation by noncardiologists. The examiners in this 
study had access to all available clinical information, including 
patient charts that documented previously identified valvular 
heart disease in 10% of patients; however, this represents a 
realistic clinical scenario. 

The level of agreement among examiners in identifying the 
presence of a systolic murmur of intensity greater than grade 
II/VI documented in this study compares favorably to that of 
previous studies involving cardiologists examining patients. 

This study provides complete information on all patients, 
allowing us to correct for verification bias by making certain 
assumptions about the patients for whom both clinicians did 
not hear a murmur or for whom there was a disagreement 
about the presence of a murmur. The majority of patients who 
did not undergo echocardiography did not have a systolic 
murmur, as judged by 2 examiners. If we assume that none of 
these patients truly had valvular heart disease, the LRs are as 
shown in ble 33-23. These LRs estimate the efficiency of the 


Table 33-23 Likelihood Ratio of the Overall Examination for an Abnormal Murmur 





Clinical Evaluation 

Patients 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Overall examination suggests abnormal murmur, 
corrected for verification bias 

All patients 



14(10-19) 

0.21 (0.13-0.34) 

Overall examination, uncorrected for verification bias 

Only patients with systolic murmurs 

0.80 

0.69 

2.6 (2.0-3.4) 

0.29(0.17-0.45) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
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clinicians to identifying aortic or mitral valvular disease 
among all patients. Because most patients do not have valvular 
heart disease, the specificity of the examination is excellent. 

The LRs reported by the investigators, uncorrected for ver¬ 
ification bias, show the performance of the clinical examina¬ 
tion among patients known to have a systolic murmur. In 
clinical practice, these patients would be more reflective of 
those referred for echocardiography to determine the pres¬ 
ence of a systolic murmur. 

Reviewed by David Cescon, MD, and Edward Etchells, 
MD, MSC 


TITLE Value of the Cardiovascular Physical Examination 
for Detecting Valvular Heart Disease in Asymptomatic 
Subjects. 

AUTHORS Roldan CA, Shively BK, Crawford MH. 

CITATION Am J Cardiol. 1996;77( 15): 1327-1331. 

QUESTION How useful is the physical examination in 
detecting the presence or absence of valvular heart disease 
in asymptomatic individuals? 

DESIGN Nonconsecutive patients were prospectively 
identified for inclusion and were examined by a cardiolo¬ 
gist blinded to other data. An echocardiographer, blinded 
to clinical findings, identified valvular heart disease. 

SETTING Outpatient clinic in the United States. 

PATIENTS The population consisted of 75 patients 
with connective tissue diseases and 68 healthy volun¬ 
teers. The patients with connective tissue diseases had 
systemic disease without cardiac symptoms and consti¬ 
tuted a group of patients for whom most physicians 
would auscultate the heart to detect asymptomatic car¬ 
diac disease associated with their underlying disorder 
(systemic lupus erythematosus, ankylosing spondylitis, 
rheumatoid arthritis, antiphospholipid antibody syn¬ 
drome). 

The mean age of participants was 38 ±11 years, 56 were 
men, and none had cardiovascular symptoms. Only 5% of 
subjects were known to have murmur or valvular heart 
disease. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Subjects were randomly sequenced for a complete physical 
examination, including dynamic auscultation by a cardiologist 
blinded to other data. The cardiologist recorded the findings for 
jugular venous pulse; the palpated carotid pulse; the palpated 
precordial maximal impulse; the presence of a right ventricular 
lift; abnormalities of the second, third, and fourth heart sounds; 
clicks; and ejection sounds. The dynamic auscultation included 


evaluation of murmur change with respiration, Valsalva maneu¬ 
ver, handgrip, and changes in body position. 

Transesophageal echocardiography was performed on all 
subjects by an echocardiographer blinded to the clinical 
examination and other data. Diagnosis of valvular disease 
was based on established criteria. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity. 

MAIN RESULTS 

Thirty-three patients had echocardiographic evidence of val¬ 
vular abnormalities, the majority (24 of 33) of which were 
mitral valve regurgitant lesions or prolapse. The predictive 
value of the individual findings is reported, but none 
occurred in more than 8% of patients, providing broad con¬ 
fidence intervals. It is difficult to disentangle the individual 
findings from the overall assessment because the individual 
components and categorization of individual murmurs were 
based on the total evaluation ( le 33-24). 

CONCLUSION 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Prospective, blinding of examination, and 
gold standard test. 

LIMITATIONS Cardiologist examiner may limit generaliz- 
ability to generalist physicians. Nonconsecutive patients. 

The study population is unique in that these patients were 
not selected according to an auscultated abnormality. They 
represent a combination of healthy patients and patients with 
noncardiac disease, all of whom might undergo auscultation 
in the course of “routine” medical care. By including healthy 
patients, a high specificity for the examination could be 
expected because most patients would not have abnormal 
findings and would not have cardiac abnormalities shown by 
echocardiogram. 

This study evaluated physical examination by a cardiolo¬ 
gist alone, without supplementary information or investiga- 


Table 33-24 Likelihood Ratio for the Overall Clinical Examination to 
Identify Patients With Abnormal Cardiac Valves 

Valvular Heart Disease by Echocardiography 

Test Sensitivity Specificity LR+ (95% Cl) LR— (95% Cl) 

Overall clinical 0.70 0.98 38(9.5-154) 0.31(0.18-0.52) 

assessment for 
a valvular abnor¬ 
mality 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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tions in a healthy population at risk for valvular heart 
disease. It is useful that the report includes the actual individ¬ 
ual components used by the cardiologists to determine their 
overall clinical assessment. The cardiologists heard a surpris¬ 
ing number of murmurs, but when they described a murmur 


as abnormal, the likelihood of an echocardiograph abnor¬ 
mality increased greatly. 
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CHAPTER 


Does This Patient Have 

Myasthenia Gravis? 

Katalin Scherer, MD 
Richard S. Bedlack, MD, PhD 
David L. Simel, MD, MHS 


CLINICAL SCENARIOS 


CASE 1 A 45-year-old man has a 2-month history of 
fluctuating double vision, a droopy right eye that 
improves with rest, and a complaint that food gets stuck 
halfway down. Your examination confirms severe right 
eyelid ptosis that dramatically improves with rest. His 
right eye adduction and up gaze are markedly impaired. 
The left eye demonstrates complete horizontal ophthal¬ 
moplegia. The limb muscle strength and reflexes are nor¬ 
mal. You wonder whether there is an accurate and 
clinically useful bedside test to help confirm the diagnosis 
of myasthenia gravis. 

CASE 2 A 69-year-old man has a 2-month history of 
intermittent spells of double vision, generalized weakness 
that worsens toward the evening, and unspecified dizzi¬ 
ness. Although he has normal strength and reflexes and no 
ophthalmoplegia, he does report fluctuating diplopia dur¬ 
ing the examination. As in case 1, you must decide 
whether to perform additional bedside tests, obtain elec¬ 
trodiagnostic or acetylcholine antibody testing, or pursue 
a broader diagnostic evaluation of the various causes of 
dizzy spells and fatigue. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Myasthenia gravis is an autoimmune disease associated with cir¬ 
culating acetylcholine receptor antibodies, modification of the 
synaptic cleft, and destruction of the postsynaptic neuromuscu¬ 
lar membrane. The clinical hallmark of the disease is fatigable 
weakness. The clinical severity ranges from mild, purely ocular, 
forms to severe generalized weakness and respiratory failure. 
Myasthenia gravis is a rare disease; its prevalence in the United 
States is reported at 14.2 in 100000. Prevalence rates have been 
increasing steadily during the past decades, likely because of 
decreased mortality, longer survival, and higher rates of diagno¬ 
sis. 1 ' 3 Men older than 50 years have the highest incidence in the 
population, with the peak at approximately aged 70 years. 
Women have 2 incidence peaks: one at approximately aged 20 to 
40 years and one at approximately aged 70 years. 4,5 

Clinicians must be alert to the symptoms and signs of myas¬ 
thenia gravis because it is an eminently treatable disease, and the 
earlier treatment is started, the better the clinical response. 6 ' 8 
Only 54% to 69% of patients with myasthenia gravis are diag¬ 
nosed within 1 year of onset, and the mean time to diagnosis is 
more than 1 year. 3,912 Untreated patients are at risk for deteriora¬ 
tion and “crisis,” which occurs when weakness becomes severe 
enough to require mechanical ventilation. 13,14 Left untreated, 
reversible and fatigable weakness may become fixed. An errone¬ 
ous diagnosis of myasthenia gravis may expose patients to 
unnecessary diagnostic procedures and treatments. 
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The acetylcholine receptor antibody test is the most spe¬ 
cific diagnostic test for myasthenia gravis. This test has 
reasonable sensitivity in generalized myasthenia gravis 
(80%-96%), but up to 50% of patients with purely ocular 
myasthenia have seronegative test results. 1519 Single-fiber 
electromyography, performed by highly trained experts at 
specialized centers, is highly sensitive for disorders of the 
neuromuscular junction but is not specific for myasthenia 
gravis. 

The purpose of this review was to determine the value of 
clinical symptoms and signs, as well as the results of simple 
provocative clinical tests, in deciding whether myasthenia 
gravis should be considered as a diagnosis and in enabling 
the physician to determine whether further confirmatory 
testing (including the highly specific and sensitive antibody 
test) is warranted. 


Anatomic and Physiologic Origins of the Symptoms 
and Signs Used to Answer This Question 

In the normal neuromuscular junction, acetylcholine is 
released into the synaptic cleft, diffuses to the postsynaptic 
membrane, binds to ligand-sensitive ion channels (nico¬ 
tinic acetylcholine receptors), and causes an excitatory 
postsynaptic end-plate potential. If the threshold depolar¬ 
ization is achieved, an action potential will spread along the 
muscle fiber membrane, causing muscle contraction. Ace¬ 
tylcholine is cleared from the synaptic cleft by presynaptic 
reuptake and by the metabolic action of acetylcholines¬ 
terase (Figure 34-1). 

The failure of transmission at many neuromuscular junc¬ 
tions in myasthenia results in diminished end-plate potentials 
that are insufficient to generate action potentials in a number 
of muscle fibers. 20 This results in fatigable weakness of striated 
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Figure 34-1 Neuromuscular Junction 

In patients with acetylcholine receptor (AChR) antibody-positive myasthenia gravis, circulating antibodies bind to the AChRs, which may block acetylcholine 
binding, lead to cross-linking of receptors promoting internalization and degradation, and induce postsynaptic membrane damage via complement activation. 
The number and availability of receptors are reduced such that end-plate potentials are insufficient to generate action potentials in a number of muscle fibers, 
causing weakness. 
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muscles, which is the basis for the clinical diagnosis. Sustained 
or repetitive muscle contraction causes fatigue and weakness 
of myasthenic muscles. Cooling a weak muscle improves neu¬ 
romuscular transmission. 21 Rest and acetylcholinesterase 
inhibitors transiently increase acetylcholine levels in the syn¬ 
aptic cleft. The change in strength after these manipulations 
can be assessed during the clinical examination. 

Symptoms and Signs and How to Elicit Them 

Patients with myasthenia gravis complain of weakness in spe¬ 
cific muscles. Up to 65% of patients initially have ocular 
symptoms of double vision and drooping of the eyelids. Less 
than one-fourth of patients present with bulbar weakness (ie, 
in lower cranial nerve-innervated oropharyngeal muscles) 
and report slurred or nasal speech, alterations of the voice 
(eg, softness, breathiness, hoarseness), and difficulty chewing 
or swallowing. Limb weakness is an unusual initial complaint 
(14%-27%) and should be differentiated from nonspecific 
generalized fatigue. Patients may report shortness of breath. 
The symptoms of myasthenia are typically better on awaken¬ 
ing or after rest and become progressively worse with pro¬ 
longed use of the affected muscles or later in the day. 3 - 22 ' 24 

Reduced muscle power by manual testing in specific mus¬ 
cles that worsens with repetition and improves with rest is 
the characteristic examination finding in myasthenia. Most 
muscles with voluntary activation have a large variability of 
strength even under normal conditions because of effort. 
Evaluating extremity strength greatly depends on the experi¬ 
ence of the examiner. Ptosis and extraocular muscle deficits 
are relatively free of a voluntary component and provide a 
more objective measure. 

Fatigable and rapidly fluctuating asymmetric ptosis is a 
hallmark of myasthenia gravis. The rapid fluctuation results 
from improvement during even very short periods of rest, 
such as blinking. Besides fast variability in the degree of pto¬ 
sis, it may altogether shift quickly from one eye to the other, 
known as “shifting ptosis.” 22 Ptosis should be evaluated with 
the patient sitting comfortably, the head held in primary 
position without tilting. The patient fixates on a distant 
object (eg, a spot on the wall) and is asked to refrain from 
blinking and to relax the forehead muscles. Frontalis con¬ 
traction, a mostly involuntary compensatory mechanism, is a 
common and characteristic sign in myasthenic patients with 
ptosis. Relaxing the forehead muscles may be difficult for 
some patients. The examiner measures palpebral fissure 
width at eye level during forward gaze and again during pro¬ 
longed upward or lateral gaze for 30 seconds. 22,25 The more 
ptotic eyelid should be used for additional provocative tests, 
such as the ice pack, rest, and sleep tests. 

The ice pack test is performed by placing a latex glove fin¬ 
ger filled with crushed ice over the more ptotic eyelid for 2 
minutes. During the rest test, the patient places a glove filled 
with cotton (a placebo) over the more ptotic eyelid while 
holding the eyes closed for 2 minutes. During the sleep test, 
the patient is left in a quiet dark room with the eyes closed 
for 30 minutes. Complete or almost complete resolution of 
ptosis or at least a 2-mm increase in palpebral fissure width 


constitutes a positive response to these maneuvers. It is 
important to evaluate the improvement immediately after 
the tests because the lids may quickly begin to droop again. 

The curtain sign (also known as “enhanced ptosis” or “par¬ 
adoxic ptosis”) is usually observed in patients with some ini¬ 
tial ptosis. The patient looks straight ahead and refrains from 
blinking. The examiner holds one eye open, which results in 
the other lid starting to droop more (like a curtain falling). 
The lid twitch sign occurs when the patient opens the eyes 
after gentle closure or follows the examiner’s finger down 
and then back up to eye level. The lids overshoot or twitch 
for a fraction of a second before settling into position and 
starting to droop. 26 

Asymmetric weakness of extraocular muscles is commonly 
observed in myasthenia when sustained lateral gaze or up 
gaze worsens or induces double vision. The cover-uncover 
test may be performed to bring out subtle extraocular weak¬ 
ness. As the patient fixates on an object in the distance, the 
examiner covers one eye while observing for deviation of the 
uncovered eye during lateral and then upward gazing. With 
extraocular weakness, the uncovered eye will drift. The 
examination is completed by repeating the procedure for the 
opposite eye. Quiver eye movements are fast, small-twitch, 
“lightning-like” or “jerk-like” movements of the eyes on 
changing direction of gaze. They are said to occur even in the 
setting of profound ophthalmoplegia. 27 

Although patients rarely complain of facial weakness, it is 
often found on examination. Severe facial weakness results in 
a characteristic transverse smile. Orbicularis oculi weakness 
is demonstrated as the examiner tries to separate the eyelids 
against forced eye closure. Orbicularis oculi fatigue may be 
observed on gentle eye closure. After complete initial apposi¬ 
tion of the lid margins, they separate within seconds and the 
white of the sclera starts to show (positive peek sign) (Figure 
34-2). 28 The iris should not be visible because of the eyeballs 
being rolled up (Bell phenomenon). The iris may be visible if 
the patient is not trying to close the eyes voluntarily (in the 
case of a conversion reaction and functional weakness) or in 
case of severe ophthalmoplegia. 

Tongue and pharyngeal weakness will result in the patient’s 
speech becoming slurred or nasal, especially with prolonged 
speaking. Other commonly weak muscles include neck flex¬ 
ors, deltoids, hip flexors, finger/wrist extensors, and foot dor- 
siflexors. The muscles should be repeatedly tested against 
manual resistance, with a brief rest between repetitions. Hav¬ 
ing the patient hold the head above the pillow in the supine 
position and having the patient hold the arms outstretched 
in abduction at the shoulder for 1 minute are ways to test for 
fatigability of neck flexors and deltoids, respectively. Involve¬ 
ment is often asymmetric. The remainder of the neurologic 
examination results, including those for deep tendon reflexes 
and sensory examination, must be normal. 

Anticholinesterase Tests 

Edrophonium chloride is a fast- and short-acting acetylcho¬ 
linesterase inhibitor that may be administered in the office 
setting to diagnose myasthenia gravis (Box 34-1). Its effect 
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Figure 34-2 Peek Sign 

Orbicularis oculi weakness may be indicated by a positive peek sign after gentle eyelid closure. After complete initial apposition of the lid margins, they quickly 
(within 30 seconds) start to separate, and the sclera starts to show (ie, a positive peek sign). The presence of a peek sign increases the likelihood of myasthenia 
gravis (likelihood ratio, 30; 95% confidence interval, 3.2-278), but absence of the peek sign does not rule it out. 


Box 34-1 Edrophonium Test 

Establish reliable peripheral intravenous access. 

Prepare a syringe with 2 mg of atropine (available in 
ampoules of 0.4 or 1 mg/mL) as a precaution. 

Prepare 1 mL (10 mg) of edrophonium in a tuberculin 
syringe (edrophonium is available in a 10 mg/mL solution 
in a 1-mL ampoule [10 mg] or in a 10-mL vial [total of 
100 mg]). 

Inject 2 mg (0.2 mL) slowly for 15 seconds while observ¬ 
ing for an objective improvement in target muscles. 

Improvement should occur within 30 seconds and disap¬ 
pear in 5 minutes; if there is no response or no significant 
adverse effects, administer the remaining edrophonium (8 
mg [0.8 mL]), for a total dose of 10 mg. 

Atropine should be injected (0.5 or 1 mg) in case of clin¬ 
ically significant bradycardia, respiratory distress, or 
syncope. 1 

“Routine administration of atropine simultaneously with edrophonium for the pur¬ 
pose of diagnostic testing for myasthenia gravis is not recommended. Bartley and 
Bullock 29 recommend using a 3-way stopcock, with the edrophonium-containing 
syringe attached to the direct port and the atropine-containing syringe attached to 
the side port so that atropine may be quickly injected in case of severe adverse 
effects. 


usually occurs within 30 seconds and lasts less than 5 min¬ 
utes. Most myasthenic muscles respond to the test dose of 
2 mg, but many will require more. Adverse effects are rare 
and usually mild (excess salivation, sweating, abdominal 
cramps, or fecal incontinence). Serious adverse effects, such 
as bradycardia, asystole, and bronchoconstriction, occur 
infrequently (<0.2%) but warrant that the patient receive 
cardiac monitoring during the test and that a bag-mask be 
available should the patient need ventilatory assistance. 30,31 
Reactive airway disease or cardiac bradyarrhythmias are rela¬ 


tive contraindications. Using a 3-way stopcock setup may be 
feasible in a patient already equipped with a peripheral intra¬ 
venous line (eg, in an intensive care unit). One concern with 
such a setup is the possibility of an accidental mix-up of the 
syringes, with resultant injection errors—the syringes should 
always be labeled clearly. Because of the short action of the 
drug, the examiner must be able to quickly assess for 
improvement. Evaluating extraocular muscle abnormalities 
or changes in manual muscle testing requires skill and time; 
therefore, most experts recommend performing the edro¬ 
phonium test only when the patient has easily observed base¬ 
line weakness in specific muscles. 32 Some authors suggest that 
a clearly ptotic eyelid or visibly abnormal extraocular mus¬ 
cles are the only acceptable findings to observe for objective 
endpoints. 27 Unequivocal improvement in ptosis or extraoc¬ 
ular muscles constitutes a positive response. The administer¬ 
ing physician (especially one with less experience) should 
consider blinding the edrophonium administration to avoid 
expectation bias. 

Neostigmine bromide is an anticholinesterase agent used 
to treat myasthenia gravis. Parenteral preparations are 
available in vials containing 0.25-, 0.5-, and 1-mg/mL 
doses. The recommended dose for the diagnosis of myas¬ 
thenia is 0.02 mg/kg given intramuscularly. A standard dose 
of 1 or 1.5 mg may be used. The response should be evalu¬ 
ated 30 minutes after injection, at peak effect. The half-life 
after intramuscular administration is 50 to 90 minutes. 
Adverse effects, precautions, and need for good intravenous 
access (to administer atropine in case of an adverse event) 
are the same as for edrophonium. 

Pyridostigmine bromide is an analog of neostigmine, with 
a slightly longer duration of action and fewer adverse effects. 
It is the most commonly used anticholinesterase agent for 
the symptomatic treatment of myasthenia gravis. It has been 
used for diagnosis in patients in whom edrophonium or neo¬ 
stigmine is relatively contraindicated, although it is not gen¬ 
erally used for diagnostic purposes. 33 It is available for 
injection in 2-mL vials containing 5 mg/mL. A 2-mg intra- 
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muscular or intravenous dose is equivalent to 60 mg orally. 
Precautions should be exercised just as with edrophonium 
and neostigmine. 

METHODS 

Search Strategy and Quality Review 

English-language articles in the MEDLINE database from Jan¬ 
uary 1966 through January 2005 were searched using the terms 
“myasthenia gravis,” “diagnosis,” and “test.” One of the authors 
(K.S.) identified potential articles by screening the retrieved 
titles and abstracts (when available) and searching through the 
bibliographies of the retrieved articles. Two authors (K.S. and 
R.S.B.) independently reviewed the retrieved articles. An arti¬ 


cle was included when agreement existed that the study had 
met our inclusion criteria. 

Eligible studies evaluated a particular symptom or sign in 
patients with myasthenia gravis and in controls. Studies requir¬ 
ing sophisticated equipment or subspecialty trained physicians 
(otolaryngology, ophthalmology, etc) were excluded. Studies 
based on small numbers of patients were not excluded, because 
most series are comparatively small in the literature. Of 640 
total articles, the search identified 33 potential articles. Of these, 
15 met inclusion criteria and form the basis of this review. 28 - 33 ' 46 
Quality of evidence in each study was classified according to a 
published classification scheme for levels of evidence developed 
for The Rational Clinical Examination series (Table 34- ). 47 
Only 2 studies included an independent blinded comparison of 
signs and symptoms to a criterion standard. 34 - 36 


Table 34-1 Characteristics of Studies That Include Patients With Myasthenia Gravis, as Well as Controls 


Source, y 

Enrollment 

Patient 

Selection 

Patients With 
Myasthenia 
Gravis, No./ 
Overall (%) 

Diagnostic Criteria for 
Myasthenia Gravis 

Symptom or Sign Studied 
(Inclusion Criteria) 

Enrollment Site 

Evidence Level 2“ 

Kubis et al, 34 2000 

Prospective 

Consecutive 

10/25(40) 

AChRAb or SFEMG 

Ice test, rest test (ptosis) 

Neuro-ophthalmology clinic 

Evidence Level 3“ 

Ertas et al, 36 1994 

Prospective 

Unclear 

12/27(44) 

Overall clinical impression 

Ice test, edrophonium, or neostig¬ 
mine test (ptosis) 

Neurology clinic 

Czaplinski et al, 35 2003 

Prospective 

Unclear 

5/10(50) 

AChRAb and RNS 

Ice test, edrophonium test (ptosis) 

Neurology clinic 

Evidence Level 4“ 

Sethi et al, 40 1987 

Unclear 

Unclear 

10/17(59) 

Overall clinical impression 

Ice test, edrophonium test (ptosis) 

Neurology clinic 

Odeletal, 41 1991 

Unclear 

Unclear 

42/68 (62) 

Edrophonium test 

Sleep test (ptosis or ophthalmoplegia) 

Ophthalmology clinic 

Golniket al, 39 1999 

Prospective 

Unclear 

20/40 (50) 

AChRAb or edrophonium 
test 

Ice test (ptosis) 

Neuro-ophthalmology clinic 

Ellis et al, 37 2000 

Prospective 

Consecutive 

15/30(50) 

Overall clinical impression 

Ice test (ptosis or abnormal 
extraocular movements) 

Ophthalmology clinic 

Lertchavanakul etal, 38 
2001 

Prospective 

Unclear 

20/40 (50) 

EMG or neostigmine test 

Ice test (ptosis) 

Ophthalmology clinic 

Evidence Level 5“ 

Osserman and 
Kaplan, 44 1952 

Prospective 

Unclear 

15/50(30) 

Overall clinical impression 

Edrophonium test 

Neurology clinic, hospital 

Yee et al, 45 1976 

Prospective 

Unclear 

10/18(56) 

Edrophonium or neostig¬ 
mine test 

Quiver eye movements (ophthal¬ 
moplegia) 

Ophthalmology clinic 

Osher and Griggs, 28 
1979 

Prospective 

Consecutive 

25/275 (9) 

Unclear 

Peek sign (orbicularis oculi fatigue) 

Ophthalmology clinic 

Nicholson et al, 43 

1983 

Prospective 

Consecutive 

46/75(61) 

Overall clinical impression 
with 1 positive test result 

Edrophonium test 

AChRAb laboratory 

Batocchi et al, 42 1997 

Prospective 

Consecutive 

39/72 (54) 

Overall clinical impression 
with 2 positive test results 

Edrophonium test (ptosis, ophthal¬ 
moplegia) 

Ophthalmology clinic 

Padua et al, 33 2000 

Prospective 

Consecutive 

29/69 (42) 

AChRAb + SFEMG or 

RNS + AChEI 

Edrophonium or pyridostigmine test 

Neurology clinic 

Weijnen et al, 46 2000 

Unclear 

Unclear 

60/80 (75) 

Overall clinical impression 

Food in mouth after swallowing, 
unintelligible speech after pro¬ 
longed speaking 

Oromaxillofacial surgery 
clinic 


Abbreviations: AChEI, acetylcholine esterase inhibitor; AChRAb, acetylcholine receptor antibody; EMG, electromyography; RNS, repetitive nerve stimulation; SFEMG, single-fiber 
electromyography. 

“See Table 1 -7 for a description of Evidence Levels. 
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Statistical Methods 

Sensitivity was defined as the proportion of patients with 
myasthenia gravis who had the particular symptom or sign; 
specificity, as the proportion of nonmyasthenic patients 
without the sign or symptom. The positive likelihood ratio 
(LR+) was defined as the likelihood of a positive test result 
(or presence of a sign or symptom) in a myasthenic patient 
compared with the likelihood of a positive test result in a 
nonmyasthenic patient, that is, the increase in odds that the 
patient has myasthenia gravis when the test result is positive 
(or when the sign or symptom is present). LR+ is expressed 
as sensitivity/(l - specificity). The negative likelihood ratio 


Table 34-2 Clinical Signs and Symptoms and Results of Clinical Tests 
in the Prediction of Myasthenia Gravis 


LR (95% Cl) 


Source, y 

Positive 

Negative 

Symptoms 

Food in mouth after swallowing 

Weijnen et al, 46 2000 

13(0.85-212) 

0.70 (0.58-0.84) 

Unintelligible speech after prolonged speaking 

Weijnen et al, 46 2000 

4.5 (1.2-17) 

0.61 (0.46-0.80) 

Signs 

Peek sign 

Osher and Griggs, 28 1979 

30 (3.2-278) 

0.88(0.76-1.0) 

Quiver eye movements 

Yeeetal, 45 1976 

4.1 (0.22-75) 

0.82(0.57-1.2) 

Simple Office Tests 

Ice test 

Kubis et al, 34 2000 

28 (1.8-427) 

0.14(0.03-0.62) 

Ertas et al, 36 1 994 

31 (2.0-472) 

0.04(0-0.61) 

Czaplinski et al, 35 2003 

11 (0.77-158) 

0.09(0.01-1.3) 

Sethi et al, 40 1987 

12(0.83-185) 

0.24 (0.08-0.72) 

Golniket al, 39 1999 

33(2.1-515) 

0.22(0.10-0.50) 

Ellis et al, 37 2000 

31 (2.0-475) 

0.03 (0-0.46) 

Lertchavanakul et al, 38 2001 

39 (2.5-605) 

0.07(0.01-0.33) 

Summary 

24 (8.5-67) 

0.16(0.09-0.27) 

Anticholinesterase test 

Ertas et al, 36 1 994 

28.0 (1.8-436) 

0.12(0.03-0.54) 

Czaplinski et al, 35 2003 

9.0(0.61-133) 

0.27(0.07-1.1) 

Sethi et al, 40 1987 

12(0.83-185) 

0.24 (0.08-0.72) 

Osserman and Kaplan, 44 1952 

70(4.4-1096) 

0.03 (0-0.46) 

Nicholson et al, 43 1983 

54 (3.5-850) 

0.10(0.04-0.24) 

Batocchi et al, 42 1997 

67(4.3-1053) 

0.01 (0-0.16) 

Padua et al, 33 2000 

9.7 (3.8-25) 

0.04(0.01-0.28) 

Summary 

15(7.5-31) 

0.11 (0.06-0.21) 

Rest test 

Kubis et al, 34 2000 

16(0.98-261) 

0.52 (0.29-0.95) 

Sleep test 

Odel et al 41 

53 (3.4-832) 

0.01 (0-0.16) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


(LR-) is the likelihood of a negative test result (or absence 
of a sign or symptom) in a myasthenic patient compared 
with the likelihood of a negative test result (or absence of a 
sign or symptom) in a nonmyasthenic patient, that is, the 
decrease in odds that the patient has myasthenia gravis 
when the test result is negative (or when the sign or symp¬ 
tom is absent). LR- is expressed as (1 - sensitivity)/specific- 
ity. Summary LRs were derived using random-effects 
measures that provide conservative confidence intervals 
(CIs) around the estimates. 48 ' 50 

RESULTS 

Fifteen studies reported findings on patients both with 
and without myasthenia gravis 28,33 ' 46 (Table 34-1). Seven 
studies evaluated the ice test, including 3 that also evalu¬ 
ated the response to anticholinesterase agents and 1 that 
also evaluated the rest test. Four additional studies 
reported on the response to anticholinesterase agents and 
1 additional study on the sleep test. The remaining 3 arti¬ 
cles included 1 study reporting on 2 symptoms and 2 stud¬ 
ies evaluating 1 sign each. The results across studies for 
the ice test and anticholinesterase tests were homoge¬ 
neous; we report random-effects summary LRs for these 
signs (Table 34-2). 

Accuracy of Symptoms for the 
Diagnosis of Myasthenia Gravis 

Only 1 eligible study was identified, and it evaluated 2 symp¬ 
toms. 46 The history was taken from patients via a question¬ 
naire. The presence of food remaining in the mouth after 
swallowing increases the likelihood of myasthenia gravis, but 
the wide Cl indicates that the finding is not reliable. Speech 
becoming unintelligible during prolonged speaking has an 
LR of 4.5 (95% Cl, 1.2-17). Neither normal swallowing nor 
normal speech rules out myasthenia gravis (LR, 0.70; 95% 
Cl, 0.58-0.84; and LR, 0.61; 95% Cl, 0.46-0.80, respectively). 

Accuracy of Signs for the 
Diagnosis of Myasthenia Gravis 

Two eligible studies were identified and reported on 1 sign 
each. 28,45 The presence or absence of quiver eye movements 
increased the likelihood of myasthenia gravis, but the broad CIs 
around the LR indicate that the examiner may not rely on the 
finding. The presence of the peek sign might be more useful 
(LR, 30; 95% Cl, 3.2-278) but also has broad CIs. 

Accuracy of Simple Office Tests for the 
Diagnosis of Myasthenia Gravis 

Seven studies investigated the ice test, and all had similar find¬ 
ings. 34 ' 40 The overall prevalence (prior probability) of myasthenia 
gravis in these studies was 49% (92 of 189 patients total). All but 
1 of these studies were carried out prospectively. The LR for a 
positive ice test result suggests that the finding is useful (sum¬ 
mary LR, 24; 95% Cl, 8.5-67). A negative ice test result lessens 
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the likelihood of myasthenia gravis (summary LR, 0.16; 95% Cl, 
0.09-0.27). 

Two studies evaluated the precision (ie, interobserver vari¬ 
ation) of the ice test. Kubis et al 34 used the signed rank test to 
evaluate interobserver variability and found no significant 
difference between observers (P = .79). Ertas et al 36 reported 
complete agreement among their observers. Neither of the 
studies evaluated the intraobserver variation. 

Seven studies reported the results of anticholinesterase 
tests, and all had similar findings. 33 ’ 35 - 36 ' 40 ’ 42 ' 44 Five of these stud¬ 
ies evaluated the edrophonium test; one study included 
response to pyridostigmine, and another included response to 
neostigmine as an alternative. All but 1 of these studies were 
prospective, and 3 were carried out on consecutive patients. 
One hundred fifty-six (49%) of 320 patients had myasthenia 
gravis. The likelihood of myasthenia gravis increases for a 
positive test result (summary LR, 15; 95% Cl, 7.5-31), 
whereas the lack of improvement makes myasthenia gravis 
much less likely (summary LR, 0.11; 95% Cl, 0.06-0.21). 

Two studies evaluated the sleep or rest test on 93 patients, 
including 52 (56%) with myasthenia gravis. 34,41 An abnormal 
rest test result increases the likelihood of myasthenia, but the 
wide Cl indicates uncertainty about the true significance. A 
positive sleep test result may be more useful (LR, 53; 95% 
Cl, 3.4-832). Both the rest and sleep test make the probabil¬ 
ity of myasthenia unlikely when the result is normal (LR, 
0.52; 95% Cl, 0.29-0.95; and LR, 0.01; 95% Cl, 0-0.16, 
respectively). 

Are These Symptoms or Signs Ever Normal? 

Fluctuating weakness (ie, reduced muscle power) that worsens 
with exertion and improves with rest or with application of ice 
or cold is never normal. It is important to differentiate fluctu¬ 
ating weakness from patients’ reports of weakness, which most 
often refers to fatigue or exertion. True fluctuating weakness, 
as demonstrated by manual muscle testing, is the cardinal fea¬ 
ture of myasthenia gravis. Other neuromuscular conditions 
(including amyotrophic lateral sclerosis and periodic paraly¬ 
ses) may be associated with fluctuating weakness; however, the 
fluctuation in myasthenia is more dramatic and occurs much 
more rapidly. Ptosis or diplopia may be present in a number of 
conditions (congenital exotropia or esotropia, strabismus, 
congenital ptosis, cranial nerve palsies, myopathies, progres¬ 
sive external ophthalmoplegia, brainstem lesions, and neuro- 
degenerative disorders such as progressive supranuclear palsy), 
but the constant degree of involvement and associated neuro¬ 
logic findings (pupillary abnormalities, nystagmus, vertigo, 
sensory involvement) commonly exclude myasthenia gravis as 
a diagnosis. One must bear in mind that even the initially fluc¬ 
tuating weakness of myasthenia gravis will become fixed over 
time if severe enough. The hypomimia (masked facies) of par¬ 
kinsonism may be mistaken for facial weakness, but on exami¬ 
nation, no true weakness is found and associated features of 
parkinsonism are evident. It may be a challenge to differentiate 
true fatigable weakness caused by myasthenia gravis from con¬ 
version reactions. In the latter conditions, one may often find 
that various elements of the examination are inconsistent with 


pathophysiologic conditions. Conversion reactions commonly 
produce giveaway weakness, in which an initial full resistance 
suddenly gives way under the hand of the examiner, as 
opposed to true weakness that gradually worsens or is present 
from the start. Ptosis produced by conversion reactions is 
commonly symmetrical and bilateral. Because it occurs with 
contraction of the orbicularis oculi, one can observe that the 
lower lid elevates. It may completely disappear with diverting 
the patient’s attention. Eye closure weakness caused by poor 
effort results in the iris showing between the eyelids. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 Fluctuating diplopia and ptosis are highly charac¬ 
teristic of myasthenia gravis. The presence of a positive rest 
test result may increase the likelihood of myasthenia. The 
physician must carefully question the patient regarding his 
complaint of food getting stuck halfway down. If it is food 
remaining in the mouth after swallowing, it may also increase 
the probability of myasthenia. The available evidence-based 
data, however, do not allow the examiner to rely on these 
findings to confirm the diagnosis. These positive test results 
should prompt the clinician to confirm the diagnosis with the 
acetylcholine receptor antibody test and to refer this patient 
to a specialist (neurologist or neuro-ophthalmologist). 

CASE 2 The presentation of an elderly patient complaining 
of fluctuating double vision and weakness worsening toward 
the end of the day raises the possibility of myasthenia gravis. 
The lack of quiver eye movements, peek sign, or history of 
unintelligible speech after prolonged speaking or food in the 
mouth after swallowing does not significantly reduce the like¬ 
lihood of myasthenia according to the studies we reviewed. 
This patient does not have any objective ptosis or visible 
diplopia, so provocative tests cannot be performed. A search 
should be undertaken for causes of nonspecific dizziness and 
generalized fatigue. If, however, he continues to complain of 
fluctuating double vision, he should be referred for specialist 
evaluation to rule out myasthenia despite normal physical 
examination findings. 


THE BOTTOM LINE 

The presence of certain historical features (speech becoming 
unintelligible after prolonged periods) or signs (peek sign) 
may be useful in confirming the diagnosis of myasthenia 
gravis, although their absence does not rule it out. The ice 
test, the sleep test, and response to anticholinesterase agents 
(especially the edrophonium test) are useful in confirming 
the diagnosis and reduce the likelihood when results are neg¬ 
ative. A positive test result should prompt proceeding with 
acetylcholine receptor antibody testing and specialist referral 
for electrophysiologic tests and should help confirm the diag¬ 
nosis in patients who have negative results for the acetylcho¬ 
line receptor antibody panel. 

This review has several limitations, and the results should 
be interpreted with caution. The results may not be general- 
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izable for a number of reasons. Myasthenia gravis is a rare 
disorder, and the number of studies evaluating its symptoms 
and signs are few. The studies included in this review exam¬ 
ined only a few symptoms and signs in a selected group of 
patients with a confirmed diagnosis of myasthenia gravis. 
Because of possible verification bias in this selected popula¬ 
tion of patients with myasthenia (in whom confirmation of 
the diagnosis is more likely with clear-cut cases), it is 
expected that in the general population these tests have a 
lower sensitivity but even higher specificity. Because of the 
uncertainty regarding sensitivity, patients with persistent 
symptoms but normal physical examination findings should 
be referred to specialists for diagnosis. The specificity and 
sensitivity of the described tests may also greatly depend on 
the skill and experience of the examiner. Future studies are 
needed that evaluate not only intraobserver variability but 
agreement between experts and nonexperts. There are other 
signs that may be more useful than those tested historically 
and that await scientific study. This review underscores the 
need for more studies to evaluate symptoms and signs pre¬ 
dictive of myasthenia to improve physicians’ ability to recog¬ 
nize and evaluate patients at presentation. 
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Scherer K, Bedlack RS, Simel DL. Does this patient have 
myasthenia gravis? JAMA. 2005;293(15):1906-1914. 

The Update was prepared within 12 months of The Ratio¬ 
nal Clinical Examination article publication, so the “Make 
the Diagnosis” section summarizes findings published in the 
original review. 


CLINICAL SCENARIOS 


Case 1 

A 45-year-old man has a 2-month history of fluctuating 
double vision, a droopy right eye that improves with rest, 
and a complaint that food gets stuck halfway down. Your 
examination confirms severe right eyelid ptosis that dra¬ 
matically improves with rest. His right eye adduction and 
up gaze are markedly impaired. The left eye demonstrates 
complete horizontal ophthalmoplegia. The limb muscle 
strength and reflexes are normal. You wonder whether 
there is an accurate and clinically useful bedside test to 
help confirm the diagnosis of myasthenia gravis. 

Case 2 

A 69-year-old man has a 2-month history of intermittent 
spells of double vision, generalized weakness that worsens 
toward the evening, and unspecified dizziness. Although 
he has normal strength and reflexes and no ophthalmople¬ 
gia, he does report fluctuating diplopia during the exami¬ 
nation. As in case 1, you must decide whether to perform 
additional bedside tests, obtain electrodiagnostic or ace¬ 
tylcholine antibody testing, or pursue a broader diagnostic 
evaluation of the various causes of dizzy spells and fatigue. 


CLINICAL SCENARIOS—RESOLUTION 


Case 1 

Fluctuating diplopia and ptosis are highly characteristic of 
myasthenia gravis. The presence of a positive rest test 
result may increase the likelihood of myasthenia. The phy¬ 
sician must carefully question the patient regarding his 
complaint of food getting stuck halfway down. If it is food 
remaining in the mouth after swallowing, it may also 
increase the probability of myasthenia. The available evi¬ 
dence-based data, however, do not allow the examiner to 
rely on these findings to confirm the diagnosis. These pos¬ 
itive test results should prompt the clinician to confirm 
the diagnosis with the acetylcholine receptor antibody test 
and to refer this patient to a specialist (neurologist or 
neuro-ophthalmologist). 

Case 2 

The presentation of an elderly patient complaining of 
fluctuating double vision and weakness worsening toward 
the end of the day raises the possibility of myasthenia 
gravis. The lack of quiver eye movements, peek sign, or 
history of unintelligible speech after prolonged speaking 
or food in the mouth after swallowing does not signifi¬ 
cantly reduce the likelihood of myasthenia according to 
the studies we reviewed. This patient does not have any 
objective ptosis or visible diplopia, so provocative tests 
cannot be performed. A search should be undertaken for 
causes of nonspecific dizziness and generalized fatigue. If, 
however, he continues to complain of fluctuating double 
vision, he should be referred for specialist evaluation to 
rule out myasthenia despite normal physical examination 
findings. 
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MYASTHENIA GRAVIS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Myasthenia gravis is a rare disease. The prevalence in the United 
States is reported at approximately 14.1 in 100000. 1_3 Men older 
than 50 years have the highest incidence, with the peak at 
approximately aged 70 years. Women have 2 incidence peaks: 
one at approximately aged 20 to 40 years and one at approxi¬ 
mately aged 70 years. 4,5 The prior probability of myasthenia 
gravis in the general population among patients presenting 
with symptoms is unknown. Because of the high prevalence of 
the disease in the included studies (close to 50%), the results 
may not be generalizable to the general population. 

POPULATION FOR WHOM MYASTHENIA 
GRAVIS COULD BE CONSIDERED 

• Patients with asymmetric fluctuating eyelid ptosis 

• Patients with extraocular dysmotility not referable to a 
single nerve 

• Patients with weakness of other specific muscles 

• Young women of childbearing age and men and women 
aged approximately 70 years 

DETECTING THE LIKELIHOOD 
OF MYASTHENIA GRAVIS 

The clinical findings, when applied to the correct patient 
population, are important (Table 34-3). 


Table 34-3 Detecting the Likelihood of Myasthenia Gravis 


LR (95% Cl) 

Makes the Diagnosis More Likely 

The presence of an abnormal sleep test result in a 
patient with symptoms 

53 (3.4-832) 

The presence of the peek sign in a patient with symptoms 

30 (3.2-278) 

The presence of an abnormal ice test result in a patient 
with symptoms 

24 (8.5-67) 

The presence of a positive response to an anticholines¬ 
terase test in a patient with symptoms 

15(7.5-31) 

The presence of the history “speech becoming unintelligible 
during prolonged speaking" in a patient with symptoms 

4.5(1.2-17) 

Reduces the Likelihood of Myasthenia 

The presence of a normal rest test result in a patient 
with symptoms 

0.01 (0-0.16) 

The absence of a positive response to an anticholines¬ 
terase test in a patient with symptoms 

0.11 (0.06-0.21) 

The presence of a normal ice test result in a patient with 
symptoms 

0.16(0.09-0.27) 

The presence of a normal sleep test result in a patient 
with symptoms 

0.52 (0.29-0.95) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


REFERENCE STANDARD TESTS 

The reference standard for definite myasthenia gravis is typical 
clinical presentation plus one of the following: elevated acetyl¬ 
choline receptor antibody level or abnormal electrodiagnostic 
study results (repetitive nerve stimulation or single-fiber elec¬ 
tromyography). These criteria should also be fulfilled in clinical 
practice for definite diagnosis. 
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CLINICAL SCENARIOS 


Are These Patients Having Myocardial Infarctions? 

CASE 1 A 57-year-old man presents to the emergency 
department with squeezing retrosternal pain that started 1 
hour ago. He is diaphoretic. His blood pressure is 110/70 
mm Hg, his heart rate is 74/min, and he has an audible 
fourth heart sound. The electrocardiogram (ECG) reveals 
a 2-mm ST-segment elevation in leads V 1 to V 4 . 

CASE 2 A 70-year-old man, with a myocardial infarction 
(MI) 5 years previously, presents to the emergency depart¬ 
ment with severe tightness in the neck. The discomfort started 
30 minutes ago and was associated with diaphoresis. His 
blood pressure is 90/60 mm Hg, his heart rate is 50/min, and 
the ECG reveals Q waves in V 4 to V 4 (present in the old ECG). 

CASE 3 A 50-year-old woman presents to the emergency 
department with retrosternal burning of 1 hour’s dura¬ 
tion and nausea. Antacids provided no relief. The findings 
of the clinical examination were unremarkable. The ECG 
reveals 3-mm ST-segment elevation in leads II, III, and 
aVF and 1-mm ST-segment depression in leads I and aVL. 

CASE 4 A 40-year-old woman presents to the emergency 
department with a 24-hour history of left-sided chest 
pain. The pain is worsened by exertion and movement. 
Her medical history is unremarkable. The examination 
reveals normal vital signs and tenderness with palpation 
of the left lower costal cartilages. An ECG result is normal. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


There have been numerous technologic advancements made 
in the assessment of patients with symptoms suggestive of 
acute MI. These include evaluation of time-dependent 
changes in enzyme levels and biomarkers, as well as an 
assessment of wall-motion abnormality using echocardiog¬ 
raphy, radionuclide angiography, or nuclear imaging. 

Despite this progress, a carefully conducted history-taking 
and physical examination remain the first components—and 
the cornerstones—in the initial assessment of patients pre¬ 
senting with suspected MI. The medical history and physical 
examination are critical in guiding the selection of further 
diagnostic and therapeutic interventions. Clinicians comple¬ 
ment their clinical examination with a 12-lead ECG and bio¬ 
markers, which are additional data that provide the most 
definitive diagnosis of MI. We will focus on features of medi¬ 
cal history, physical examination, and ECG that aid in 
increasing or decreasing the likelihood of acute MI. We 
include the ECG in our review because the clinician often 
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interprets the results at the patient’s bedside as part of a 
prompt initial clinical evaluation. 

For the purpose of clarification, we begin by describing the 
3 diagnostic groupings of patients with acute chest pain cur¬ 
rently used by clinicians and then contrast these with the cat¬ 
egorization of chest pain as presence or absence of MI, as is 
evident in the literature. We then briefly describe signs and 
symptoms of MI, mechanisms of chest pain, and conditions 
that may present with symptoms suggestive of MI. After 
these introductory topics, a detailed account of the precision 
and accuracy of the medical history, physical examination, 
and ECG in the diagnosis of MI is provided. 

DEFINITIONS 

Cardiac ischemic chest pain presents in a spectrum of condi¬ 
tions, including angina, unstable angina, and MI. Angina is 
defined as a discomfort in the chest or adjacent areas, caused 
by myocardial ischemia, usually brought on by exertion, and 
associated with a disturbance of myocardial function, but 
without myocardial necrosis. 1 Various grading systems of the 
severity of angina pectoris have been developed. The classifi¬ 
cation proposed by the Canadian Cardiovascular Society, 2 
outlined in Table 35-1, is a practical one adopted in a variety 
of settings. 

Unstable angina encompasses a spectrum of symptomatic 
manifestations of ischemic heart disease intermediate between 
stable angina and acute MI. According to historical features, 
ECG findings (with and without pain), and hemodynamic 
changes (low blood pressure, third heart sound, mitral regur¬ 
gitation, and pulmonary crackles), guidelines have been devel¬ 
oped to stratify patients with suspected unstable angina into 
high, intermediate, or low risk of complications after initial 
evaluation. 3 These guidelines also recommend disposition 
based on initial assessment of risk. 

The diagnosis of MI used in most studies is based on crite¬ 
ria proposed by the World Health Organization (WHO). In 
an attempt to standardize the diagnosis of acute MI, the 


Table 35-1 Grading of Angina of Effort by the Canadian 
Cardiovascular Society 

Grade Description 


1 “Ordinary physical activity does not cause angina,” such as 

walking or climbing stairs. Angina with strenuous or rapid or 
prolonged exertion at work or recreation. 

II 

“Slight limitation of ordinary activity.” Walking or climbing stairs 
rapidly, walking uphill, or walking or stair climbing after meals, 
in cold, in wind, or under emotional stress, or only during the 
few hours after awakening. Walking more than 2 blocks on the 
level and climbing more than 1 flight of ordinary stairs at a nor¬ 
mal pace and in normal conditions. 

III 

“Marked limitation of ordinary physical activity.” Walking 1 or 2 
blocks on a level surface and climbing 1 flight of stairs in nor¬ 
mal conditions and at a normal pace. 

IV 

“Inability to carry on any physical activity without discomfort— 
angina syndrome may be present at rest.” 


WHO requires evolutionary changes on serially obtained 
ECG tracings or an increase and decrease in biomarker levels, 
either with typical ischemic-type chest discomfort and an 
ECG result that was not normal or with an ECG progression 
labeled probable and associated with lesser symptoms. 4 

Diagnosis in Acute Chest Pain 

Determining the correct diagnosis is imperative to adminis¬ 
tering the appropriate therapy. The available therapeutic 
options create the categories for patients presenting to the 
emergency department with chest pain or other symptoms 
suggesting cardiac ischemia. Three distinct management 
strategies determine the diagnostic groupings clinicians use 
currently (Figure 35-1). 

For the first group of patients, which includes those with 
MI and ST-segment elevation or left bundle-branch block 
(LBBB) (Figure 35-1, group A), current therapy consists of 
early percutaneous coronary interventions or thrombolytic 
therapy. A second group of patients includes those with MI 
but without ST-segment elevation or LBBB, or those with 
high-risk unstable angina (Figure 35-1, group B). These 
patients require intensive monitoring, immediate administra¬ 
tion of antiplatelet agents, and possibly antithrombotic ther¬ 
apy. The third group includes patients with low-risk unstable 
angina or nonischemic chest pain (Figure 35-1, group C). Cli¬ 
nicians may consider either admitting these patients to an 
intermediate care setting or ward bed or discharging them 
home with plans for subsequent diagnostic testing to establish 
the cause of their symptoms. Economic pressures on the 
health care system have highlighted the importance of distin¬ 
guishing the second from the third group of patients. 

Ideally, we should have information that allows us to clas¬ 
sify patients into one of these 3 groups. This is not, however, 
the issue addressed by most studies of the medical history 
and physical examination in the setting of acute chest pain. 
Rather, as shown in Figure 35-2, studies typically classify 
patients with acute chest pain into 2 groups according to the 
presence (group 1) or absence (group 2) of MI. Specifically, 
all patients with MI (Figure 35-1, groups A and B) are com¬ 
pared with all those without MI (Figure 35-1, group C). 

The results of studies that used the Figure 35-2 design may 
mislead clinicians who need to discriminate among the 3 
groups of patients as shown in Figure 35-1. Clinical features 
that fail to distinguish patients with infarct or high-risk unsta¬ 
ble angina from those with low-risk unstable angina or nonis¬ 
chemic chest pain might still be useful in the decision about 
whether to admit to a monitored bed in an acute care hospital. 
The study design that most investigators have chosen, depicted 
in Figure 35-2, does not correlate with the current triage of 
chest-pain patients according to the therapeutic options avail¬ 
able. Current therapeutic interventions for MI require the 
presence of ECG changes. Categorizing patients as in Figure 
35-2 will, however, provide clinically important information 
when we have interventions that are clearly useful in acute MI 
both with and without ECG changes. Our review will aid the 
reader in identifying features of the medical history, physical 
examination, and ECG that help differentiate acute MI 
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Figure 35-1 Diagnostic Groupings of Acute Chest 
Pain Based on Management Strategies 
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patients, both with and without ECG changes, from non-MI 
patients. Clinicians must avoid misinterpreting the diagnostic 
information we will present as if it were useful in differentiat¬ 
ing among the 3 groups in Figure 35-1. 

Relevant Signs and Symptoms 

Patients with acute MI typically present with a characteristic 
combination of signs and symptoms, as outlined in standard 
textbooks of medicine. Pain is described as being the most 
common presenting complaint, and considerable emphasis is 
placed on the characteristics of the pain, including its loca¬ 
tion, duration, radiation, and quality. Location of the pain 
includes the central portion of the chest or epigastrium, with 
potential radiation to the arms, neck, jaw, or less commonly 
to the abdomen and back. Quality of the chest pain is charac¬ 
teristically described with adjectives such as squeezing, crush¬ 
ing, and pressure. 

Other symptoms also may be present, including diaphore¬ 
sis, nausea, vomiting, weakness, and syncope. Although cer¬ 
tain features have been identified as being important in 
recognizing MI, follow-up data from the Framingham study 
cohort estimate that approximately 25% of infarcts may go 
unrecognized because of either lack of chest pain or atypical 
symptoms. 5 

Mechanism of Chest Pain in Myocardial Infarction 

Three-fourths of all patients with recognized acute MI present 
with chest pain. 6 Cardiac ischemic pain originates in the myo¬ 
cardium, where free nerve endings are the sensory receptors. 
Cardiac afferent impulses travel through fibers in the cardiac 
sympathetic nerves, the upper 5 sympathetic ganglia, the white 


rami communicants, the gray rami, and then via the upper 4 
or 5 thoracic roots. Cardiac afferent impulses project to the 
dorsal horn convergent neurons, travel via the spinothalamic 
tract to the thalamus, and subsequently to the cortex, where 
the cardiac stimuli are decoded. 

Afferent impulses also travel in the cholinergic fibers of the 
vagus nerve, many of which arise from the inferior cardiac 
wall. The signs and symptoms of nausea, bradycardia, and 

hypotension, which appear to be more prevalent in patients 

with inferior wall MI, are believed to be related to the larger 

number of vagal afferent fibers located in the inferior cardiac 
wall. 7 

Like other visceral sensations, myocardial pain is poorly 
and variably localized. In addition, sensations originating in 
other intrathoracic structures (particularly the esophagus) 
can cause pain that is indistinguishable from cardiac pain. 

Conditions That May Present With Symptoms 
Suggestive of Myocardial Infarction 

There are many other clinical conditions that can present 
with symptoms suggestive of acute MI, which can be broadly 
divided into cardiac and noncardiac disorders. The noncar¬ 
diac causes of chest pain are further divided into gastro¬ 
esophageal diseases and nongastroesophageal diseases, whereas 
the cardiac causes are grouped into ischemic and nonis¬ 
chemic conditions. Figure 35-3 illustrates the most common 
of these conditions but is not all inclusive. 

Given the diversity of the conditions presenting with chest 
pain, and the extent of the diagnostic testing that would be 
required, it is difficult to determine the relative frequency of 
each of these conditions occurring in the setting of chest pain. 


Figure 35-2 Categorization of Patients With 
Acute Chest Pain in Studies Ascertaining Test 
Properties of History, Physical Examination, and 
Electrocardiogram 
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Chest Pain 



Figure 35-3 Cardiac and Noncardiac Conditions Presenting With Chest Pain 


Pozen et al, 8 in an evaluation of 1032 patients presenting to 
the emergency department with a chief symptom of chest 
pain, including follow-up ECG and cardiac enzyme tests for 
both hospitalized and nonhospitalized patients, reported an 
overall incidence of acute ischemia of 29% (ischemia included 
new-onset or unstable angina and MI). In an attempt to 
determine the etiology of noncardiac chest pain, Panju et al 9 
conducted further cardiac and gastrointestinal (GI) investiga¬ 
tions in 100 patients discharged from a cardiac care unit 
(CCU) with chest pain not yet diagnosed (8.1% of the CCU 
admissions for chest pain). More than 75% of these patients 
had evidence of esophageal disorders by objective testing, 
including 24-hour intraesophageal pH monitoring, upper GI 
tract endoscopy with biopsy, esophageal motility studies, or 
upper GI tract barium series. These results are generalizable 
to patients discharged from the CCU with chest pain not yet 
diagnosed, a distinct subset of the patients who have noncar¬ 
diac chest pain and present to the emergency department. 

METHODS 

Inclusion Criteria of Tests for Precision and Accuracy 

Given the limited number of studies that have focused on the 
precision of the medical history, physical examination, and 
ECG in the diagnosis of MI, we developed a broad set of 
inclusion criteria. We included studies that consisted of an 
assessment of the interobserver or intraobserver variation, of 
features of the medical history, physical examination, and 
ECG among patients with chest pain or a diagnosis of MI. 

For the accuracy of the medical history, physical examina¬ 
tion, and ECG, we included studies that met the following 
criteria: patients: those with chest pain thought to be 
ischemic in nature; test: history, physical examination, or 
ECG described in adequate detail; outcome: MI or no infarc¬ 
tion using the definition described above; sample size: stud¬ 
ies with a sample size of at least 200 patients. 


Search Strategy 

For both precision and accuracy of the medical history, physi¬ 
cal examination, and ECG, we performed an English-language 
MEDLINE search from 1980 to 1997, using the following 
Medical Subject Heading terms and search strategy: (1) “med¬ 
ical history taking or physical examination” and “myocardial 
infarction or chest pain” and (2) “reproducibility of results or 
observer variation” and “myocardial infarction” or “chest 
pain.” A textword search was also performed, using “interob¬ 
server,” “intraobserver,” “accuracy,” “precision,” “reliability,” 
“sensitivity,” “specificity,” and “myocardial infarction” or “chest 
pain.” Additional search strategies for accuracy included the 
term “myocardial infarction, diagnosis” (subheading). For all 
strategies, references from appropriate articles were reviewed 
to provide additional references for this article. Of the 14 refer¬ 
ences used to assess the precision and accuracy of the history, 
physical examination, and ECG in the diagnosis of acute MI, 
12 were obtained from the MEDLINE search strategy and 2 
from the review of reference lists. 

Selection of Articles 

One author (B.R.H.) initially screened the titles and 
abstracts. If she thought the articles might be relevant, she 
and another author (A.A.P.) reviewed the articles in detail 
and determined their eligibility. 

Methodologic Quality Assessments 

We evaluated the methodologic quality of articles addressing 
the accuracy of medical history, physical examination, or ECG 
using criteria developed for this series (see Table 1-7). 10 A 
grade A designation meant an independent, blind compari¬ 
son of sign or symptom with a gold standard among 500 or 
more consecutive patients suspected of having the target con¬ 
dition; grade B meant an independent, blind comparison of 
sign or symptom with a gold standard among fewer than 500 
consecutive patients suspected of having the target condition; 
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grade C meant an independent, blind comparison of sign or 
symptom with a standard of uncertain validity or indepen¬ 
dent, blind comparison of sign or symptom with a gold stan¬ 
dard among nonconsecutive patients suspected of having the 
target disorder. 

Analysis 

To calculate likelihood ratios (LRs) for features of the medi¬ 
cal history, physical examination, and ECG, we considered 
studies suitable for combination if the sensitivity and speci¬ 
ficity met one of the following criteria: / 2 test of sensitivity 
and specificity excluding statistically significant heterogene¬ 
ity (P > .05) or range of sensitivity and specificity across 
studies of 15% or less. We pooled studies satisfying at least 1 
criterion and calculated LRs by simple combination of results 
across studies. The 95% confidence intervals (CIs) were cal¬ 
culated according to the method of Simel et al. 11 

RESULTS 

Precision of the Medical History and Physical Examination 

Precision refers to the degree of variation between observers 
(interobserver variation) or within observers (intraobserver 
variation) regarding a particular clinical finding. Hickan et 
al 12 studied the precision of an important aspect of the his¬ 
tory, namely, that of chest pain. They assessed the interob¬ 
server agreement in chest pain histories obtained by general 
internists, nurse practitioners, and self-administered ques¬ 
tionnaires for 197 inpatients and 112 outpatients with chest 
pain. As outlined in Table 35-2, the 2 internists, who each 
independently interviewed 47 of 197 inpatients, showed high 
agreement for 7 of the 10 items, including location and 
description of the pain, as well as aggravating and relieving 
factors. Agreement was slightly lower between internist and 
questionnaire and between the nurse practitioners and intern¬ 
ist, with the lowest level of agreement between nurse and 
questionnaire. Features of the chest pain associated with a 


lower probability of MI, namely, pleuritic, positional, and 
sharp chest pain, typically showed a lower level of agreement 
for all comparisons. 

The precision of the medical history obtained also depends 
on the reliability of the sources themselves. Kee et al 13 
assessed the reliability of a reported family history of MI 
from patients who had recently survived MI with that of 
other documented sources, including hospital charts and 
death certificates. They reported a moderate level of agree¬ 
ment, with a K of 0.65. 

Few studies have evaluated the precision of features of the 
physical examination in the assessment of patients with sus¬ 
pected MI. One study did evaluate the interobserver agree¬ 
ment among 3 clinicians in the assessment of physical 
symptoms and signs of heart failure in 102 MI patients. 14 As 
shown in Table 35-3, agreement was high for dyspnea, as well 
as for the displaced apex beat. However, the level of agree¬ 
ment for the other physical symptoms and signs of heart fail¬ 
ure, particularly the assessment of pulmonary rales and 
hepatomegaly, was considerably lower. 

Precision of the Electrocardiogram Interpretation 

Unfortunately, most studies that assessed the precision of ECG 
interpretation reported the percentage agreement between cli¬ 
nicians, without taking into account chance agreement 
through the use of K or other statistical measures. 15 Precise 
interpretations are important because they are made at the 
bedside and set off immediate management strategies. There 
are several factors that may influence the interpretation of 
the ECG, including the clinical observation of the patient 
and clinical data (expectation bias), as well as the training 
and experience of the individual reading the ECG. Although 
they must be interpreted with caution, the results of earlier 
studies suggest appreciable variability in precision in the 
interpretation of ECGs. 

In one of the earlier studies, 16 10 clinicians with experience in 
cardiology read 100 ECGs on 2 separate occasions and classified 
the tracings as normal, abnormal, or infarction. The 3 clinicians 


Table 35-2 Interobserver Agreement in Recording Chest Pain Histories 1 

3 




Inpatients (n = 

197) 

Outpatients (n 

= 112) 

Attribute 

Two Internists, k Internists and Questionnaire, k 

Nurse and Internists, k Nurse and Questionnaire, k 

Pain radiates to left arm 

0.89 

0.58 

0.43 

0.41 

Pain relieved by nitroglycerin 

0.79 

0.51 

0.94 

0.77 

History of myocardial infarction 

0.78 

0.81 

0.70 

0.81 

Pain in substernal location 

0.74 

0.50 

0.38 

0.19 

Pain brought on by exertion 

0.63 

0.51 

0.42 

0.22 

Pain described as “pressure” 

0.57 

0.37 

0.49 

0.50 

Patient must stop activities when pain occurs 

0.50 

0.47 

0.44 

0.40 

Pain brought on by cough or deep breath 

0.44 

0.30 

0.55 

0.62 

Pain described as “sharp” 

0.30 

0.26 

0.33 

0.31 

Pain brought on by moving arms or torso 

0.27 

0.44 

0.52 

0.54 


“Adapted, with permission, from Hickan et al. lz 
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agreed completely in only one-third of the ECGs. After a second 
reading, the clinicians disagreed with 1 of 8 of their original 
reports. Gjorup et al 17 had 16 residents in internal medicine read 
107 ECGs of suspected MI patients and assess whether signs 
indicative of acute infarction were present. There was disagree¬ 
ment in approximately 70% of the cases. 

Brush et al 18 reported much higher agreement in a study in 
which 2 clinicians classified 50 ECGs according to evidence of 
infarction, ischemia or strain, left ventricle hypertrophy, 


Table 35-3 Interobserver Agreement in Assessment of Physical 
Symptoms and Signs of Heart Failure in Patients With Myocardial Infarction 3 

Physical Sign 

Range, k 

Dyspnea 

0.62-0.75 

Displaced apex beat 

0.53-0.73 

S 3 gallop 

0.14-0.37 

Rales 

0.12-0.31 

Neck vein distention 

0.31-0.51 

Hepatomegaly 

0-0.16 

Dependent edema 

0.27-0.64 


“Adapted, with permission, from Gadsboll et al . 14 


LBBB, or paced rhythm. They obtained agreement in 45 of the 
50 cases (k = 0.69). 

The precision in the interpretation of ECGs appears to 
increase with experience. Eight cardiologists interpreted ECGs 
of 1220 clinically validated cases of various cardiac disorders, 
including anterior, inferior, or combined MI, as well as right, 
left, or biventricular hypertrophy. 19 The interobserver agree¬ 
ment among cardiologists was reasonably high, with an aver¬ 
age K of 0.67. For the 125 selected ECGs that were read twice 
by each cardiologist, different diagnoses were given for 10% to 
23% of the ECGs (intraobserver reproducibility, 77%-90%). 

Sgarbossa et al 20 assessed the precision of features of the ECG 
that may aid in the diagnosis of acute MI in the presence of 
LBBB. In this study, 4 investigators read 2600 ECGs and 
achieved a K of more than 0.85 for QRS-complex and T-wave 
polarities, with a high degree of correlation among the investi¬ 
gators for interpretation of ST-segment deviation (Pearson 
product moment correlation coefficient, > 0.9). 

Studies Used to Determine Accuracy of the Medical 
History, Physical Examination, and Electrocardiogram 

Table 35-4 summarizes features of the 14 studies 8,21 ' 33 used to 
determine the accuracy of the medical history, physical 


Table 35-4 Features of Studies Used to Determine Accuracy of the Medical History, Physical Examination, and Electrocardiogram 


Source, y 

Methodologic Quality 3 

Inclusion Criteria 

Incidence of Ml, 

% 

No. of Patients 
(% Women) 

Age, y 

Country 

Rude et al, 21 1983 

A 

Consecutive patients admitted to 

CCU with suspected Ml 

50 

3697 (38) 

Mean = 61 

United States 

Yusuf et al, 22 1984 

B 

Consecutive patients admitted to 

CCU with suspected Ml 

85 

475 (15) 

Mean = 56 

United Kingdom 

Pozen et al, 8 1984 

A 

Consecutive patients presenting to 
ED with chest pain 

NR 

2801 (NR) 

Men > 30 
Women > 40 

United States 

1 — 
CD 
CD 

CD 

9L 

CD 

CO 

cn 

A 

Consecutive patients presenting to 
ED with chest pain 

17 

596 (52) 

>25 

United States 

Tierney et al, 24 1986 

B 

Consecutive patients presenting to 
ED with chest pain 

12 

492 (NR) 

Men > 30 
Women > 40 

United States 

Herlihy et al, 25 1987 

B 

Consecutive patients admitted to 

CCU with suspected Ml 

44 

265 (NR) 

NR 

United States 

Klaeboe et al, 26 1987 

B 

Consecutive patients admitted to 

CCU with suspected Ml 

59 

237 (36) 

Range = 29-90 

Norway 

Rouan et al, 27 1989 

A 

Consecutive patients presenting to 
ED with chest pain 

14 

7115(50) 

>30 

United States 

Solomon et al, 28 1989 

A 

Consecutive patients presenting to 
ED with chest pain 

14 

7734 (50) 

>30 

United States 

Berger et al, 29 1990 

B 

Consecutive patients admitted to 
hospital with chest pain 

36 

278 (31) 

57 

Switzerland 

Weaver et al, 30 1990 

C 

Patients with chest pain brought to 
ED by paramedics 

18 

2472 (NR) 

<75 

United States 

Jonsbu et al, 31 1991 

B 

Consecutive patients admitted to 
hospital with suspected Ml 

36 

200 (NR) 

NR 

Norway 

Karlson et al, 32 1991 

A 

Consecutive patients admitted to 
hospital with suspected Ml 

20 

4690 (NR) 

NR 

Sweden 

Kudenchuk et al, 33 1991 

C 

Patients brought to ED by paramedics 

33 

1189(34) 

<74 

United States 


Abbreviation: CCU, cardiac care unit; ED, emergency department; Ml, myocardial infarction; NR, not reported. 
“See “Methodologic Quality Assessments" subsection of the text for an explanation of these grades. 
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examination, and ECG in the diagnosis of acute MI. Five of the 
studies included consecutive patients presenting to the emer¬ 
gency department with chest pain, 8 - 23 - 24 ’ 27 ’ 28 7 included patients 
admitted to the hospital or CCU for suspected MI, 21,22,25,26,29,31,32 
and 2 included patients with chest pain who were brought to 
the emergency department by paramedics. 30,33 

The studies examined a variety of features of the clinical 
examination and ECG. For the sake of relevance and clarity, 
we highlight the results of those variables in which an LR of 
2.0 or more, or an LR of 0.5 or less, was obtained. These 
studies provide the best available evidence for identifying 
those features that aid in the diagnosis of MI. 

Accuracy of the Medical History and Physical Examination 

Nine of the studies outlined in Table 35-4 reported the rela¬ 
tion between features of the clinical examination of patients 
presenting to the emergency department with chest pain, as 
determined by physicians, with that of the final diagnosis of 
MI. In all studies, the gold standard for the diagnosis of MI 
was based on cardiac enzyme and ECG changes, except for 
the study by Weaver et al, 30 in which the discharge diagnosis 
was used to define acute MI. Although features of the clinical 
examination are extremely insensitive in diagnosing MI, they 
are reasonably specific and their presence is more likely to 
occur in patients with MI. 

Although patients can present with MI and have no chest 
pain, chest pain always prompts a consideration that the 
patient is having myocardial ischemia. Nonetheless, multi¬ 
variate models show that the independent value of chest pain 
or pain in the left arm, once all factors are considered, has an 
odds ratio (OR) of only 2.7. 8 Confining chest pain in the 
model to “chest pain as the most important symptom” has an 
even lower OR for MI (OR, 2.0). 8 

As noted in Table 35-5, chest pain radiation was the clini¬ 
cal feature that increased the probability of MI the most, with 
a wider extension of pain associated with the highest likeli¬ 
hood of MI. In particular, chest pain radiating to the left arm 
was twice as likely to occur in patients with, as opposed to 
those without, MI, whereas radiation to the right shoulder 
was about 2 times as likely, and radiation to both the left and 
right arm was 9 times as likely to occur in such patients. 
Chest pain radiating to the right arm alone has been reported 

to be an extremely specific, but insensitive, marker of MI 

(LR, 7.3; 95% Cl, 3.9-14). 29 Elowever, as reflected by the 
width of the Cl, these results were based on a small number 
of subjects (6 of the 100 patients with MI) and therefore 
require confirmation. 

Other items of the history that aided in the diagnosis of MI 
included history of MI (LR < 3.0) or diaphoresis (LR, 2.0). 

A number of features from the history and clinical exami¬ 
nation thought to be useful in determining the presence of 
MI were of little value in establishing such a diagnosis. Fea¬ 
tures of the history, including age older than 60 years, male 
sex, history of angina or coronary artery disease, history of 
nitroglycerin use, duration of chest pain greater than 60 min¬ 
utes, constant or episodic chest pain, and chest pain of sud¬ 
den onset, were all associated with LRs of less than 2. 


Adjectives used to describe the quality of the chest pain, 
including that of pressure, aching, and squeezing, were also 
associated with LRs of less than 2. Therefore, none of these 
features carry information independently useful in establish¬ 
ing an MI diagnosis. 

The 3 components of the physical examination associated 
with LRs higher than 2 included the presence of a third heart 
sound (LR, 3.2), hypotension (LR, 3.1), and pulmonary 
crackles on auscultation (LR, 2.1). Dyspnea was not found to 
be an important component of the clinical examination. 
Other features frequently described in the assessment of the 
patient with chest pain, including bradycardia and tachycar¬ 
dia, were not evaluated. 

Cardiac risk factors, including hypertension, smoking, 
obesity, hypercholesterolemia, diabetes, and a family history 
of cardiovascular disease, are frequently included in the med¬ 
ical history of a patient presenting with chest pain. However, 
current evidence provides little support for the diagnostic 
value of a history of these risk factors. In 3 large studies of 
patients presenting to the emergency department with chest 
pain, none of the classic cardiac risk factors emerged as inde¬ 
pendent predictors of acute MI. 8,34,35 

Table 35-6 presents clinical features that decrease the 
probability of MI. Chest pain described as pleuritic, sharp, 
stabbing, or positional decreased the likelihood of MI signifi¬ 
cantly. In addition, chest pain reproduced by palpation on 


Table 35-5 Clinical Features That Increase the Probability of a 
Myocardial Infarction in Patients Presenting With Acute Chest Pain 

Clinical Feature 

LR (95% Cl) 

Reference 

Chest pain radiation Both arms with pain 

9.7 (4.6-20) 

29 

Left arm pain 

2.2 (1.6-3.1) 

29 

Right shoulder pain 

2.9 (1.4-6.0) 

24,29 

Third heart sound on auscultation 

3.2(1.6-6.5) 

24 

Hypotension (systolic blood pressure 
< 80 mm Hg) 

3.1 (1.8-5.2) 

30 

Pulmonary crackles on auscultation 

2.1 (1.4-3.1) 

24 

Diaphoresis 

2.0(1.9-2.2) 

24, 28, 31 

Nausea or vomiting 

1.9 (1.7-2.3) 

24, 25, 29, 31 

History of Ml 

1.5-3.0“ 

8, 24 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; Ml, myocardial infarction. 
“In heterogeneous studies the LRs are reported as ranges. 


Table 35-6 Clinical Features That Decrease the Probability of a 
Myocardial Infarction in Patients Presenting With Acute Chest Pain 


Clinical Feature 

LR (95% Cl) 

Reference 

Pleuritic chest pain 

0.2 (0.2-0.3) 

23, 24, 28 

Chest pain sharp or stabbing 

0.3 (0.2-0.5) 

23, 24 

Positional chest pain 

0.3 (0.2-0.4) 

23,28 

Chest pain reported by palpation 

0.2-0.4“ 

23, 24, 28 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“In heterogeneous studies the LRs are reported as ranges. 
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physical examination was associated with a low LR, ranging 
from 0.2 to 0.4. 

Accuracy of the Electrocardiogram 

Eight studies addressed the accuracy of the ECG in diagnos¬ 
ing MI. The results reported in this article are for interpreta¬ 
tion of the ECGs by clinicians and not by computer 
algorithms. Interpretation of the ECG was by an indepen¬ 
dent physician blinded to the clinical data in 5 of the stud¬ 
ies, 8 - 21 ' 22,32 ' 33 by the emergency department physician alone in 2 
others, 23,27 and by the emergency department physician with a 
review by an independent physician blinded to the clinical 
data in l. 24 In all studies, the gold standard for the diagnosis 
of MI was based on cardiac enzyme levels, except for the 
study by Kudenchuk et al, 33 in which the hospital discharge 
diagnosis was used to define MI. 

Several features of the ECG have been used to assist in the 
diagnosis of acute MI. The most common characteristics 
include the presence of Q waves, ST-segment elevation or 
depression, and T-wave inversion. As noted in Table 35-7, 
there was a considerable degree of variability among the 
studies for some of these features. New ST-segment elevation 
was the most powerful feature in increasing the probability 
of MI, with the LRs ranging from 5.7 to 54. The presence of a 
new Q wave was also much more likely to occur in patients 
with, as opposed to those without, MI, with LRs ranging 
from 5.3 to 25, although the usefulness of this finding was 
reduced when patients with old Q waves were included. 

ST-segment depression, whether new or known to have 
been present previously, and new T-wave peaking or inver¬ 
sion were all approximately 3 times as likely to occur in 
patients with, as opposed to those without, MI. In addition, 
conduction defects, particularly those reported to be new, 
also increased the probability of MI. 

A normal ECG result decreased the probability of MI the 
most and was associated with LRs of 0.1 to 0.3. 19,20,26,31 


Table 35-7 Features of the Electrocardiogram That Increase the 
Probability of a Myocardial Infarction in Patients Presenting 

With Acute Chest Pain 

Feature of the Electrocardiogram 

LR (95% Cl) 

Reference 

Any ST-segment elevation 

11 (7.1-18) 

24 

New ST-segment elevation > 1 mm 

5.7-54 a 

21-24, 32, 33 

New conduction defect 

6.3 (2.5-16) 

24 

New Q wave 

5.3-25 a 

21,24, 32,33 

Any Q wave 

3.9 (2.7-5.7) 

24 

Any ST-segment depression 

3.2 (2.5-4.1) 

24 

T-wave peaking or inversion > 1 mm 

3.1 b 

8 

New ST-segment depression 

3.0-5.2 a 

21,24, 32 

Any conduction defect 

2.7(1.4-5.4) 

24 

New T-wave inversion 

2.4-2.8 a 

24, 32, 33 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“In heterogeneous studies, the LRs are reported as ranges. 
“Data not available to calculate CIs. 


The Role of Combined Findings and Clinical 
Prediction Rules for Myocardial Infarction 

Clinicians are frequently presented with multiple clinical 
examination items, each of which can be considered a sepa¬ 
rate diagnostic test for establishing the diagnosis of MI. The 
problem in situations such as this is in knowing how to com¬ 
bine the LRs from these multiple tests to obtain an accurate 
estimate of the posttest probability of MI. The simple serial 
multiplication of LRs that has been proposed assumes that 
the tests are conditionally independent 15 ; that is, that the 
patient’s results on one test bear no relationship to the results 
on any of the other tests. As demonstrated by Holleman and 
Simel, 36 violation of the conditional independence assump¬ 
tion can yield inaccurate posttest probabilities of disease. 
Unfortunately, the precision and accuracy of serial combina¬ 
tions of findings were not reported in the studies included in 
this review. However, the combination of clinical findings 
considered as a group is assessed in clinical prediction rules. 

By combining findings from patients’ medical history, phys¬ 
ical examination, and ECG, investigators have developed 
probability-based decision aids, as well as computer-based 
protocols and guidelines, that categorize patients with chest 
pain into risk groups according to their probability of MI. 34,35,37 
These tools have been devised to improve physician recogni¬ 
tion and triage of patients with acute ischemic events. 8,38 
Although these measures have helped clinicians make appro¬ 
priate decisions, not all studies of probability-based risk assess¬ 
ment tools have demonstrated improvement in emergency 
department triage or reduction in resource use. 39 These clinical 
prediction rules conform to the methodologic standards of 
clinical prediction rules initially proposed by Wasson et al 40 
and recently revised, 41 except for the validation of the rule by 
Tierney et al, 34 which was performed on a subset, rather than 
on a prospective sample of the population. 

Tierney et al 34 developed an instrument for the prediction 
of MI. According to multivariate analysis of 540 emergency 
department patients with chest pain, 4 variables with inde¬ 
pendent predictive value for infarction were identified. These 
included diaphoresis with chest pain, history of MI, ECG 
changes of a new Q wave, and ST-segment elevation either 
new or old. 

Goldman et al 35,37 also developed a protocol to predict MI 
in emergency department patients with chest pain. The 
instrument was based on the medical history, physical exam¬ 
ination, and ECG of more than 6000 patients presenting at 
an emergency department with a chief complaint of chest 
pain. Variables in Goldman’s algorithm include patient older 
than 40 years, history of angina or MI, chest pain that began 
less than 48 hours before arrival at the emergency depart¬ 
ment, longest pain episode 1 hour or more, pain worse than 
usual angina or the same as earlier MI, and radiation of pain 
to neck, left shoulder, or left arm as predictors of infarction. 
Features of the chest pain, including radiation to the back, 
abdomen, or legs; stabbing pain; and pain reproduced by 
palpation included in the algorithm decrease the probability 
of infarction. The ECG changes predictive of an acute MI 
included new ST-segment elevation or Q waves in 2 or more 
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leads and new ST-T-segment changes of ischemia or strain. 
According to the algorithm, patients can be assigned to one 
of 14 subgroups, with a probability of acute MI ranging from 
1% to 77%. 

These prediction rules included several of the common 
variables identified in univariate analysis and included in this 
review, namely, the location and extent of the chest pain, 
chest pain with diaphoresis, and ECG changes, including 
new Q-wave and ST-segment elevation. However, in situa¬ 
tions in which the independence of features of the medical 
history and clinical examination has not been tested, as in 
these studies, clinicians may be misled when combining these 
multiple clinical findings. In these situations they should 
look to clinical prediction rules to help integrate and inter¬ 
pret the results. 

Pretest Probability in the Diagnosis 
of Myocardial Infarction 

To determine the posttest probability, or likelihood, of dis¬ 
ease according to the clinical features and their associated 
LRs, one must take into account the pretest probability, or 
likelihood, of that condition. Although much focus has been 
placed on the combination of multiple clinical variables and 
the development of prediction rules for MI, as described 
above, there has been little emphasis on establishing the pre¬ 
test probability of MI according to standard clinical assess¬ 
ment. If an estimate of the pretest probability of MI is 
available, a diagnostic test, based on its sensitivity, specificity, 
and LR, can be used to establish a new estimate of disease 
likelihood. A classic and widely used example of this concept 
was proposed by Diamond and Forrester. 42 Estimates of the 
pretest probability of coronary artery disease according to 
age, sex, and chest pain description have been published and 
are easily used in the clinical setting. A more comprehensive 
attempt to consider all clinical characteristics has also been 
undertaken. 43 

The predictive value of the medical history, physical exam¬ 
ination, and ECG depends on the pretest probability of MI. 
Even with a normal ECG result, for example, a high pretest 
probability of MI would result in a high posttest probability 
of this condition being present. Proper use of these findings 
must therefore incorporate the pretest probability of MI. 


COMMENT 

The diagnosis of MI in the setting of chest pain is a complex 
task. Clinicians categorize patients with chest pain into 3 
groups according to current therapeutic interventions, 
whereas in the literature patients with chest pain are typically 
categorized into the presence or absence of MI. According to 
this latter categorization, we have assessed the features of the 
medical history, physical examination, and ECG, which aid 
in increasing or decreasing the likelihood of acute MI. We 
have also addressed the use of clinical prediction rules, which 
use a number of clinical variables, to aid in the diagnosis of 
MI, as well as the need to take into account pretest probability 


of disease when assessing the predictive value of individual 
variables. 

Referring to the scenarios presented at the beginning of 
this article, the first 3 have features that increase the likeli¬ 
hood of acute MI. Patient 1 has chest pain, diaphoresis, and 
ST-segment elevation. Patient 2 has diaphoresis, hypoten¬ 
sion, and history of an MI. Patient 3 has nausea and ST- 
segment elevation. In contrast, patient 4 has features that 
decrease the likelihood of MI, namely, a normal ECG result 
and chest pain that is both positional and reproducible by 
palpation. 

Clinicians interested in distinguishing patients with acute 
MI from those with unstable angina and nonanginal chest 
pain can use either Goldman’s algorithm or the individual 
clinical features that we summarize in Tables 35-5, 35-6, and 
35-7. However, the distinction between MI and non-MI chest 
pain may not be the most relevant initial clinical decision; it is 
more important to decide on appropriate immediate therapy. 

THE BOTTOM LINE 

The presence of any of the following clinical findings 
increases the likelihood of MI: patients presenting with chest 
pain radiating to the left arm, radiating to the right shoulder, 
or radiating to both left and right arms; and patients present¬ 
ing with chest pain diaphoresis, a third heart sound, or with 
hypotension. 

The presence of any of the following clinical findings 
decreases the likelihood of MI: patients presenting with chest 
pain that is described as pleuritic, sharp or stabbing, posi¬ 
tional, or reproduced by palpation. 

Features of ECG that increase the likelihood of MI include 
the following: new ST-segment elevation, new Q waves, any 
ST-segment elevation, and new conduction defect. A normal 
ECG result is a powerful feature in ruling out MI. 

Finally, as noted previously, these findings may not be rel¬ 
evant for distinguishing between patients with acute 
ischemic syndromes requiring CCU admission from those 
with less dangerous ischemia or nonischemic pain. Further 
research is required in this regard. 
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CLINICAL SCENARIO 


A 62-year-old woman experienced chest discomfort while 
walking from the parking garage to your hospital. She 
decided to stop in the emergency department for evalua¬ 
tion. The discomfort has been present for about 10 to 12 
minutes and is creating a dull ache in her left shoulder and 
arm. As you interview her, she is diaphoretic and experi¬ 
encing the chest discomfort. Her blood pressure is 145/95 
mm Hg. The lungs are clear to auscultation, whereas the 
cardiac examination reveals an S 4 systolic sound but no 
murmur. The pulses are equal in all of her extremities. An 
electrocardiogram (ECG) result seems normal, but your 
hospital provides neither computerized ECG reports nor 
computerized estimates of the probability of a myocardial 
infarction (MI). She experiences relief after a sublingual 
nitroglycerin tablet. You find that she was recently diag¬ 
nosed with diabetes and systolic hypertension, and she has 
been trying to stop smoking. There was no nausea with 
the discomfort, although she observes frequent epigastric 
discomfort that responds to antacids. She takes cimeti- 
dine, which helps with her discomfort. 

UPDATED SUMMARY ON 
MYOCARDIAL INFARCTION 

Original Review 

Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The ratio¬ 
nal clinical examination: is this patient having a myocardial 
infarction? JAMA. 1998;280(14):1256-1263. 

UPDATED LITERATURE SEARCH 

Our literature search combined the search terms “myocar¬ 
dial infarction/di” with the parent search strategy for The 
Rational Clinical Examination series, “meta-analysis,” or 
“roc curve,” limited to English-language publications in the 
MEDLINE database from 1997 to October 2004. The 
search strategy yielded 169 articles, although only 1 pro¬ 
spectively evaluated the sensitivity and specificity of the 
clinical examination for acute cardiac ischemia (ACI). Acute 
cardiac ischemia includes patients with MI or unstable 
angina pectoris. 


NEW FINDINGS 

• The reference standard for MI now includes cardiac tropo¬ 
nin levels. 

• The new reference standard requires reappraisal of the role 
of clinical findings. 

• After clinical symptoms are used to identify patients with 
possible ischemia, the ECG and troponin results take pre¬ 
cedence in making the diagnosis. 

• Radiation of chest pain to the shoulder or right arm has 
been validated as reflecting a more diffuse pain pattern that 
increases the likelihood of an MI among patients admitted 
to the hospital. However, the value of individual clinical 
symptoms or signs in the decision to admit or discharge 
the patient has not been fully evaluated with troponin- 
based case definitions. 

• The presence of diabetes, hypertension, or hyperlipidemia 
should not affect the clinician’s probability estimate that an 
episode of chest pain represents an ACI. 

Details of the Update 

We found a systematic review of the diagnosis of ACI, pub¬ 
lished in 2001, that formed the basis for evidence-based 
guidelines. 1 We reviewed this article, articles in the reference 
lists of a general systematic review on MI, 2 and a recent 
nonsystematic review of ACI diagnosis. 3 These 3 reviews 
addressed the reference standard for acute MI, the Goldman 
chest pain protocol, Acute Cardiac Ischemia Time-Insensitive 
Predictive Instrument (ACI-TIPI), and computerized deci¬ 
sion aids for diagnosing MI. Each of these diagnostic 
approaches uses combinations of findings (including the 
clinical examination) to diagnose acute MI. From reviewing 
the reference lists in the review articles, it became apparent 
that information on the clinical examination might be buried 
in articles that did not lead to Medical Subject Headings 
indexing of clinical examination terms. To explore the possi¬ 
bility that we might be missing articles, we entered the origi¬ 
nal Rational Clinical Examination article into Citation Index 
(ISI Web of Knowledge, Science Citation Index Expanded). 
Thirty articles cited the original Rational Clinical Examina¬ 
tion article on MI, including 3 that contained new informa¬ 
tion on the sensitivity and specificity of clinical evaluation 
items. 
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IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

For chest pain radiation patterns shown in Table 35-5, we 
reassessed the values. We found 1 minor calculation error in 
the original Rational Clinical Examination article for the 
likelihood ratio (LR) when chest pain radiates to both arms. 
We also found that the data from the studies referenced in 


Table 35-8 Likelihood Ratios of Chest Pain Radiation Patterns for 
Myocardial Infarction 

Pain radiation 

LR+ (95% Cl) 

LR- (95% Cl) 

Both arms with pain 4 

9.7 (4.6-20) 

0.64 (0.54-0.74) 

Right arm pain 4 

7.3(3.9-14) 

0.62 (0.52-0.73) 

Left arm pain 4 

2.2 (1.6-301) 

0.60 (0.48-0.75) 

Right shoulder pain (n = 2) 4 - 5 

2.2 (1.4-3.4)“ 

0.90 (0.82-0.97)“ 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“Values represent summary likelihood ratios. 


Table 35-9 Criteria for Acute, Evolving, or Recent Myocardial Infarction 8 

Either of the following criteria satisfies the diagnosis for an acute, evolving, 
or recent Ml: 

1. Typical increase and gradual decrease (troponin) or more rapid increase 
and decrease (creatine kinase-MB isoenzyme) of biochemical markers of 
myocardial necrosis, with at least 1 of the following: 

a. Ischemic symptoms 

b. Development of pathologic Q waves on the ECG 

c. ECG changes indicative of ischemia (ST-segment elevation or depression) 
or 

d. coronary artery intervention (eg, coronary angioplasty) 

2. Pathologic findings of an acute Ml 

Abbreviations: ECG, electrocardiogram; Ml, myocardial infarction. 

“From Alpert et al. 6 


Table 35-10 

Effect of Change in Case Definition on 

Sensitivity and Specificity 3 



Ml (WHO 
Criteria 1990) 

No Ml LR+ (95% Cl) LR- (95% Cl) 

Original Data Before Cardiac Troponins Were Available 3 

Nausea 

40 

30 2.4 (1.6-3.6) 0.72 (0.6-0.84) 

No nausea 

60 

150 

Scenario 1 b 

All Newly Reclassified Patients Have Nausea 

Nausea 

62 4- 

8 10(5.0-20) 0.52(0.43-0.62) 

No nausea 

60 

150 

Scenario 2 b 

No Newly Reclassified Patients Have Nausea 

Nausea 

40 

30 1.7 (1.1-2.6) 0.83(0.72-0.90) 

No nausea 

82 <- 

128 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative like¬ 
lihood ratio; Ml, myocardial infarction; WHO, World Health Organization. 

“The data are rounded for display purposes. 

“Arrows indicate the effect of reclassifying 7.7% of patients without Ml to having an Ml. 


Table 35-5 were sufficient for calculating the negative likeli¬ 
hood ratios. These new findings appear in able 35-8. 

CHANGES IN THE REFERENCE STANDARD 

The case definition for MI changed with the validation of 
new biomarkers. 6 The change in cardiac troponin now has 
primacy over other biochemical markers ( le 35-9). 

The new definition means that more patients will now be 
diagnosed with MI. A study of patients admitted to a cardiac 
care unit (CCU) showed that the new definitions of MI with 
troponin would lead to an absolute increase of 7.7% in the 
incidence (assuming threshold values of 8.0 mg/mL for crea¬ 
tine kinase-MB isoenzyme (CK-MB) and 0.3 mg/mL for 
troponin). 7 However, the new incidence depends on the 
threshold values for troponin and CK-MB, and the rates 
could vary considerably, depending on the study population. 

The change in MI definition undoubtedly affects the sensitiv¬ 
ity and specificity for clinical symptoms and signs, but the 
direction and magnitude of the effect are unpredictable. 
Although cardiac troponin has better sensitivity and specificity 
than the CK-MB, the issue for the clinical examination 
becomes a question of how the symptoms or signs will distrib¬ 
ute among those newly classified as having an MI. To demon¬ 
strate this, we can look at the effect on the finding of nausea as a 
presenting symptom. Berger et al 4 evaluated nausea with the 
World Health Organization criteria for MI before cardiac 
troponin levels and found the results shown in able 35-10. 
According to an increased prevalence of 7.7% in MI using 
troponin as part of the case definition, Table 35-10 shows the 
extremes of the effect on the LR of nausea for MI, depending 
on the distribution of newly defined cases with troponins. At 
one extreme, we show what happens if all the “new” patients 
with MI had nausea (scenario 1), whereas the opposite extreme 
occurs when none of the new patients experienced nausea (sce¬ 
nario 2); the truth will be somewhere between these extremes. 

In scenario 1, the results lead to an improved utility of nausea 
for correctly identifying patients. In scenario 2, in which all new 
cases are from patients who lacked nausea as a presenting clinical 
feature, the value of the finding is worse compared with the older 
case definition. The results could change between populations 
with different baseline incident rates and different threshold val¬ 
ues for biomarkers. Thus, older data on the clinical examination 
applied to current diagnostic standards may lead to either under¬ 
estimates or overestimates of the utility of symptoms or signs. 
The most reliable estimates for the clinical examination will come 
from studies that compare the results to current definitions of MI. 

RESULTS OF LITERATURE REVIEW 

The patients in the studies shown in ole 35- had normal 
or nondiagnostic ECG results. Patients with known coronary 
heart disease who had prolonged or recurrent pain typical of 
angina, those with suspected pulmonary emboli, or those with 
comorbid illness requiring admission were excluded. Thus, the 
patients remaining for inclusion in the study were typical of 
those presenting with chest pain who might have acute coro¬ 
nary syndromes, but for whom the diagnosis is uncertain. 
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The finding that radiation of pain to the right arm or both 
arms has diagnostic value may seem counterintuitive to physi¬ 
cians who consider only left arm pain as related to myocardial 
ischemia. However, in the original study that reported the signif¬ 
icance of right arm discomfort, 45 of 51 patients with pain in the 
right arm also had pain in the left arm. 4 The authors speculated 
that the presence of right arm pain represents part of a larger 
extension of pain with an MI, rather than something intrinsic to 
the radiation of pain with MI. A count of patients who had both 
right arm and left arm pain in the studies by Goodacre et al 8,9 
was not provided, but the importance of the finding of pain in 
the right arm was confirmed and was present even after adjust¬ 
ing for other symptoms. 

Chest discomfort with indigestion/burning quality indepen¬ 
dently increased the likelihood of an MI (positive LR, 2.3; 95% 
confidence interval (Cl), 1.5-3.5). 9 Because the presence of indi¬ 
gestion/burning might have been used to discharge patients 
from the emergency department, this created verification bias 
that could have distorted the LR. Therefore, the value of this 
symptom is of uncertain significance when assessing for MI. 

For patients with chest pain, the response to nitroglycerin 
does not distinguish those who will prove to have an MI from 
those who will not. The diagnostic odds ratio combined 
from 2 studies was not significantly different from 1 (ie, diag¬ 
nostic odds ratio, 0.66; 95% Cl, 0.38-1.2). 8 ’ 10 

Multivariate Findings for ACI Syndromes 

One study evaluated a variety of clinical symptoms among a 
group of patients who proved to have a 30% incidence of MI. 11 
The model included clinical variables without ECG data ( 

). The model has not been studied as a tool for emergency 
department triage of patients with chest pain. A patient’s 
increasing age, the presence of diaphoresis with the chest dis¬ 
comfort, nausea, and left arm radiation were the most impor¬ 
tant variables for increasing the probability of an MI. The 
variables that decreased the probability the most were the pres¬ 
ence of pleuritic chest discomfort or episodic pain. 

A second study evaluated the clinical findings after an ECG 
had been obtained ( : 35-2). 9 This study included only patients 

with normal or nondiagnostic ECG results after excluding those 
with known coronary heart disease who had prolonged or recur¬ 
rent pain typical of angina, those with suspected pulmonary 
emboli, or those with comorbid illness requiring admission. 

There are similarities in the variables between the multivariate 
models of these 2 studies, 9,11 although the populations were quite 
different. The generalizability of these models, both of which 
address important study populations, should be verified in new 
studies. The first multivariate model makes sense for patient 
education and is compatible with what patients should generally 
understand—chest pain associated with sweating or diaphoresis 
may be a harbinger of an MI, especially in a smoker or when the 
pain goes to the left arm. 

The second multivariate model should be useful to clini¬ 
cians because the model applies only to those who would not 
be readily admitted for additional testing to rule out an MI. 

Several chest pain protocols or decision models have been 
studied to assess their performance in identifying patients 
with cardiac ischemia ( Table 35- 2). 3 These approaches can 


Table 35-11 Univariate Findings for Acute Myocardial Infarction in 
Patients With Undifferentiated Chest Pain Admitted for Suspected 
Acute Coronary Syndrome 3 


Symptom 

LR+ (95% Cl) 

LR- (95% Cl) 

Radiation to the shoulder OR both arms 8 

4.1 (2.5-6.5) 

0.68 (0.52-0.89) 

Radiation to right arm 9 

3.8 (2.2-6.6) 

0.86 (0.77-0.96) 

Vomiting 9 

3.5 (2.0-6.2) 

0.87 (0.79-0.97) 

Ex-smoker 9 

2.5 (1.6-4.0) 

0.85 (0.76-0.96) 

Male sex 9 

1.5 (1.3-1.6) 

0.24(0.12-0.48) 

Current smoker 9 

1.4 (1.0-1.8) 

0.83(0.68-1.0) 

Radiation to left arm 9 

1.3 (0.93-1.8) 

0.90(0.76-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

a AII patients had normal or nondiagnostic electrocardiogram, no established coronary heart 
disease, and prolonged or recurrent chest pain typical of their usual discomfort. 


Table 35-12 Likelihood Ratios of Chest Pain Protocols for Acute 
Cardiac Ischemia 3 


Test (No. of 
Studies) 

Diagnosis 3 

Sensitivity 

(Range) 

Specificity 

(Range) 

LR+» 

LR- b 

ACI-TIPI C (4) 

ACI d 

0.86-0.95 

0.78-0.92 

3.9-12 

0.05-0.18 

Goldman proto¬ 
col (3) 

AMI 

0.88-0.91 

0.70-0.74 

2.9-3.6 

0.12-0.17 

Computer-based 
decision aids (6) 

AMI 

0.52-0.98 

0.58-0.96 

1.2-24 

0.02-0.83 


Abbreviations: ACI, acute cardiac ischemia; ACI-TIPI, Acute Cardiac Ischemia Time- 
Insensitive Predictive Instrument; AMI, acute myocardial infarction; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 

a AII diagnoses were based on World Health Organization criteria before adoption of the car¬ 
diac troponin assay. 

"Likelihood ratio ranges are estimated from ranges for sensitivity and specificity using 
data provided by authors. 

'Patient age, sex, and chest pain or left arm pain were the primary symptoms, plus a com¬ 
puterized analysis of the electrocardiogram (Q-wave presence and assessment of the ST 
and T waves). 

d ACI includes AMI and unstable angina. 


Box 35-1 All Patients With Chest Pain Using Data Obtained 
Without Knowledge of the Electrocardiogram Results 

Ml score = -92 + 1.0 x (age) + 17 x (diaphoresis) + 14 x (nausea) + 

11 x (smokes) + 11 x (left arm pain) + 8 x (male) - 44 x (pleuritic pain) 
- 30 x (episodic pain) -15 x (sharp pain) -15 x (previous angina) -12 
x (previous Ml) 

(If symptom present, substitute 1; if symptom absent, substitute 0.) 

Ml probability = [exp< score/15 >]/[1 + exp< score/15 >] 


Box 35-2 Patients With Undifferentiated Chest Discomfort After 
a Normal or Nondiagnostic Electrocardiogram Result 3 

Ml score = 1 16 + 1 .0 x (age) + 23 x (male) + 21 x (right arm pain) +18 
x (ex-smoker) +11 x (left arm pain) +15 x (vomiting) +15 x (smokes) + 
10 x (burning pain) 

(If symptom present, substitute 1; if symptom absent, substitute 0.) 

Ml probability = [exp< score/11 >]/[1 + exp< score/1 '>] 

a Model provided by Dr. Steve Goodacre from data originally reported in Goodacre et al, 
2003. 9 
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Figure 35-4 Goldman Chest Pain Decision Rule 

From Reilly et al. 12 

Abbreviations: CCU, cardiac care unit; ECG, electrocardiogram; ED, emergency department; Ml, myocardial infarction. 

“Modification to Goldman’s prediction rule: left bundle-branch block not known to be old was also considered evidence of ischemia on ECG. b Unstable ischemic 
heart disease was defined as a worsening of previously stable angina, new onset of postinfarction angina or angina after a coronary revascularization proce¬ 
dure, or pain that was the same as that associated with a previous Ml. 

“Cardiology consultation in the ED (for possible admission to the CCU) was recommended for patients stratified as high risk, which included patients who had 
experienced a major complication in the ED (eg, cardiogenic shock). Modification to Goldman’s prediction rule: cardiology consultation for possible CCU admis¬ 
sion was also recommended for 2 subgroups of patients: (a) patients stratified as moderate risk by the original prediction rule because they had acute pulmo¬ 
nary edema or ongoing angina despite maximal medical therapy in the ED, and (b) patients presenting with unstable angina within 2 weeks of acute Ml or within 
6 months of coronary revascularization. Patients stratified as moderate risk who also had a high probability of significant coronary artery disease (using the Diamond 
and Forrester criteria 13 ) were recommended for cardiology consultation. 


be evaluated to see whether they appropriately identify 
patients with an MI or to see whether they have a favorable 
effect on the accuracy of patient management decisions. The 
protocols have not been extensively evaluated with current 
biomarkers for MI. 

The Goldman chest pain protocol has been evaluated for 
safety and efficiency for triage decisions in a before-after study 
design. For avoiding major cardiac complications, the protocol 
allowed a safely increased admission rate to less-intensive 


unmonitored beds of patients with possible ACI vs admission 
to monitored or CCU beds. 12 The protocol uses no symptoms 
but instead relies on the ECG, 2 physical examination findings, 
and 3 items from the clinical history ( are 35 ). 

EVIDENCE FROM GUIDELINES 

The diagnosis and management guidelines for ACI were 
updated in 2004. 14 A separate update on unstable angina, non- 
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ST-segment elevation MI was released in 2002 and revised in 
2007. 15 The guidelines emphasize the importance of the ECG, an 
approach to early risk stratification (as opposed to focusing only 
on whether or not the patient has had an MI), and they empha¬ 
size that single clinical findings should not drive decision making 
and risk assessment because the diagnosis is based on a variety of 
findings. In providing general guidance, the authors recommend 
assessing symptoms, history of coronary heart disease, age, sex, 
and the number of traditional coronary heart disease risk fac¬ 
tors. An increasing number of traditional coronary heart disease 
risk factors in a patient affects the prognosis of those who prove 
to have cardiac ischemia, but the number of risk factors itself 
does not correlate well with the likelihood of acute ischemia in 
an individual episode of chest pain. 


CLINICAL SCENARIO—RESOLUTION 


Most physicians will recognize that this patient could be hav¬ 
ing ACI. Many will assume this according to her recent diag¬ 
nosis of diabetes and hypertension. However, these variables 
are not diagnostically important in assessing this episode. 
Instead, you should focus on the current symptoms, her age, 
and smoking status. 

The immediate goal of the bedside clinician is the prompt 
assessment of the likelihood of ACI and risk stratification if 
the chest pain seems of cardiac origin. 

Had your hospital provided ACI-TIPI estimates, the 
patient’s age and chest or left arm pain as the primary symp¬ 
tom would have contributed to her probability of MI. How¬ 
ever, her sex and the normal ECG result would have lowered 
the probability. The Goldman chest pain protocol suggests 
that she is at low risk of a major cardiac complication, but 
you are concerned that the risk of an MI is high. 

The history of gastrointestinal symptoms might suggest 
that she is simply having peptic discomfort. Her response to 
nitroglycerin does not allow you to sort out cardiac from 
noncardiac chest discomfort. 

Her age, diaphoresis with the pain, left arm discomfort, 
and her current smoking status are all important variables 
that increase the likelihood of an ACI event. Although a man 
with the same symptoms would have a higher risk of an MI, 
her sex does nothing to protect her from ischemia, given the 
current symptoms. A probability estimate is not necessary to 
make a decision that prompt management of ischemia and 
an effort to rule out an MI are necessary. 

After you make your decision to rule out an MI, you 
decide to evaluate the effect that the normal ECG result had 
on the importance of the clinical findings. First, you insert 
the values for her clinical findings into the decision model 
developed by Wang et al. 11 The model shows that she has a 
65% probability of an MI, supporting your decision to 
obtain serial cardiac troponin levels and ECG results. Her 
smoking status and association of the pain with diaphoresis 
were important variables because the absence of either of 
those would have decreased the probability of an MI to 23%, 
which is about the baseline risk for all patients presenting to 
the emergency department with possible ACI. However, the 
normal ECG result has a big effect on the clinical findings. 


The model by Goodacre et al 9 was evaluated in just such a 
population of patients with normal or nondiagnostic ECG 
results. With that model, the probability of an MI is about 
7%. The presence of an MI is considerably lower with the 
normal ECG result, but the 7% prediction would still lead 
most physicians to obtain the serial biomarkers and ECGs. 
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ACUTE MYOCARDIAL INFARCTION— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Approximately 25% of patients with symptoms suggesting ACI 
will prove to have an MI. 

POPULATION FOR WHOM ACUTE MYOCARDIAL 
INFARCTION SHOULD BE CONSIDERED 

Focus primarily on the symptoms associated with the present¬ 
ing complaints, rather than the risk factors. 

• Chest pain 

• Shortness of breath 

• Cardiac arrest 

• Dizziness/weakness/syncope 

• Abdominal pain 

DETECTING THE LIKELIHOOD OF 
ACUTE MYOCARDIAL INFARCTION 

The ECG is by far the most useful finding available at the 
patient’s bedside. For patients with an abnormal ECG results 
suggesting acute MI (ST-segment elevation or Q waves, new 
conduction defects, diagnostic T-wave abnormalities), the 
symptoms and signs of MI become less important (Table 35-13). 
For patients with chest discomfort and normal or nondiagnostic 


Table 35-13 Multivariate and Univariate Predictors 
of Myocardial Infarction 

Ml 


LR+ (95% Cl) or Range LR- (95% Cl) or Range 


Multivariate Predictors 

ACI-TIPI with clinical 
decision (n = 4 studies) 3 

3.9-12 0.05-0.18 

Wang logistic model 
(n = 1 study) 11 

This model provides a probability estimate for Ml for 
patients with chest discomfort independent of the 
ECG result and coronary history. 


Goodacre et al 8 logistic This model provides a probability estimate for Ml for 
model (n = 1 study) patients with chest discomfort and a normal or non¬ 
diagnostic ECG result, no history of coronary heart 
disease with similar pain, and low suspicion of pul¬ 
monary embolus. The model should not be applied to 
other patient populations. 


Univariate Predictors 

Univariate findings for acute Ml in patients with normal or nondiagnostic ECG 
results without known coronary heart disease with prolonged or recurrent 

chest pain typical of their angina 


Pain radiation to the 
shoulder OR both arms 8 

4.1 (2.5-6.5) 

0.68 (0.52-0.89) 

Pain radiation to the right 
arm 9 

3.8 (2.2-6.6) 

0.86 (0.77-0.96) 

Vomiting 9 

3.5 (2.0-6.2) 

0.87 (0.79-0.97) 

Ex-smoker 9 

2.5(1.6-4.0) 

0.85 (0.76-0.96) 


Abbreviations: ACI-TIPI, Acute Cardiac Ischemia Time-Insensitive Predictive Instrument; 
Cl, confidence interval; ECG, electrocardiogram; LR+, positive likelihood ratio; LR-, 
negative likelihood ratio; Ml, myocardial infarction. 


ECG results, some of the symptoms are diagnostically 
useful. Perhaps the most important finding for clinicians 
is the realization that a few of the important risk factors 
for coronary heart disease do not help in the acute setting 
for identifying patients with chest pain who are having an 
acute MI. The presence of diabetes, hypertension, and 
hyperlipidemia does identify patients at higher risk of 
coronary heart disease, but the presenting symptoms are 
more important for determining whether the current epi¬ 
sode represents ACI. 

The availability of the ACI-TIPI probability estimate 
requires integration of the computerized implementa¬ 
tion protocol into an ECG reading. Consequently, phy¬ 
sicians may not have access to the results. In the absence 
of the estimates, the multivariate models and the values 
in the table are the best estimates for identifying 
patients most likely to have ACI. However, it is crucial 
that clinicians understand that these variables have not 
been used to determine whether patients should be dis¬ 
charged from emergency care, observed, or admitted to 
rule out an MI. No single variable, in and of itself, has 
had consistently useful utility for ruling out an MI. The 
guidelines recommend a multivariate approach (http:// 
www.acc.org/quahtyandscience/clinical/guidehnes/unstable/ 
incorporated/table5.htm; accessed June 4, 2008). Once 
an ECG is obtained, the ACI-TIPI probability estimate 
given to the clinician is the approach with the best- 
demonstrated effect on clinical decisions and outcomes. 
Clinicians might use the multivariate model by Goodacre 
et al 9 to quantify their overah estimates for patients with 
nondiagnostic ECG results and for whom the decision to 
rule out cardiac ischemia is less certain. Although these 
predictive models may have good measurement character¬ 
istics that could help clinicians, the requirement for pro¬ 
grammable devices impedes widespread implementation. 

REFERENCE STANDARD TESTS 

Either of the following criteria satisfies the diagnosis for 
an acute, evolving, or recent MI: 

1. Typical increase and gradual decrease (troponin) or 
more rapid increase and decrease (CK-MB) of bio¬ 
chemical markers of myocardial necrosis with at least 1 
of the following: 

a. Ischemic symptoms 

b. Development of pathologic Q waves on the ECG 

c. ECG changes indicative of ischemia (ST-segment el¬ 
evation or depression) 

d. Coronary artery intervention (eg, coronary angio¬ 
plasty) 

2. Pathologic findings of an acute MI. 
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TITLE How Useful Are Clinical Features in the Diagnosis 
of Acute, Undifferentiated Chest Pain? 

AUTHORS Goodacre S, Locker T, Morris F, Campbell S. 

CITATION Acad EmergMed. 2002;9(3):203-208. 

QUESTION Do clinical features in clinically stable 
patients with nondiagnostic electrocardiograms (ECGs) 
identify those with acute myocardial infarction (AMI)? 

DESIGN Prospective, consecutive patients meeting study 
criteria, with data collected independently of outcome. 

SETTING British emergency department. 

PATIENTS During a 16-month period, data were collected 
prospectively on a chest pain observation unit in a large, 
urban teaching hospital. Patients were excluded if they had 
ECG evidence of acute cardiac ischemia (ACI), known coro¬ 
nary heart disease with prolonged or recurrent chest pain 
typical of their angina, comorbid conditions or an alternate 
problem that required admission (eg, heart failure, pulmo¬ 
nary embolus), or minimal risk of coronary heart disease (eg, 
age < 25 years, chest discomfort related to trauma, chest wall 
pain reproduced by palpation in patients with no or few risk 
factors for coronary heart disease; Steve Goodacre, PhD, Uni¬ 
versity of Sheffield, Sheffield, UK, written communication, 
November 2004). Among all emergency department patients 
with chest pain, only 25% were included in the study. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A nurse with expertise in chest pain was trained to record 
symptoms. 

MAIN OUTCOME MEASURES 

An AMI was defined by World Health Organization (WHO) cri¬ 
teria. An acute coronary syndrome (ACS) was defined as a myo¬ 
cardial infarction (MI) at presentation or an increased 
concentration of cardiac troponin by 3 days, an early positive 


exercise treadmill test result during the next 6 months, cardiac 
death, arrhythmia, or coronary revascularization within 6 
months. 


MAIN RESULTS 

Of the 893 assessed patients, 57 met the study criteria for an 
ACS (9.1%); 34 patients had an AMI (3.8%), 15 had 
increased troponin levels without meeting the older WHO 
case definition for MI, and 78 additional patients had a sub¬ 
sequent early positive treadmill test result or arrhythmia. 
Overall, 88% of those classified as having ACS actually had an 
MI using current standards. 

For AMI, the unadjusted odds ratios (ORs) with highest 
statistical significance were pain radiation to both arms (7.7), 
radiation to the shoulder (6.0), and exertional pain (3.1). For 
ACS, the unadjusted ORs with highest statistical significance 
were pain radiation to both arms (6.0), radiation to the 
shoulder (3.4), and exertional pain (2.5). All variables with 
diagnostic ORs with P < .2 were entered into a multivariate 
model for MI or ACS. 

For diagnosing MI, the following variables were not useful: 
pain radiating to the throat, sharp/stabbing pain, crushing/ 
gripping pain, heavy/pressing pain, pain duration, diaphore¬ 
sis, and relief after taking nitroglycerin. Two variables, burn¬ 
ing/indigestion pain (OR, 4.0) and nausea/vomiting (OR, 


Table 35-14 Likelihood Ratios for Pain Pattern for Acute Myocardial 
Infarction or Acute Coronary Syndrome 

Symptom 

Diagnosis 

LR+ (95% Cl) 

LR- (95% Cl) 

Pain radiation to shoul¬ 
der or both arms 

AMI 

4.1 (2.5-6.5) 

0.68 (0.52-0.89) 

Exertional pain 

AMI 

2.3(1.4-3.8) 

0.76 (0.59-0.98) 

Chest wall tenderness 3 

AMI 

0.30 (0.08-1.1) 

1.3 (1.1-1.4) 

Exertional pain 

ACS 

2.1 (1.4-23) 

0.82(0.71-0.95) 

Pain radiation to shoulder, 
left arm, or both arms 

ACS 

1.6 (1.2-2.0) 

0.68 (0.53-0.87) 


Abbreviations: ACS, acute coronary syndrome; AMI, acute myocardial infarction; Cl, 
confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Note that chest wall tenderness makes AMI less likely, whereas the absence of chest 
wall tenderness makes AMI more likely. 
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TITLE Clinical Predictors of Acute Coronary Syndromes 
in Patients With Undifferentiated Chest Pain. 

AUTHORS Goodacre SW, Angelini K, Arnold J, Revill S, 
Morris F. 

CITATION Q / Med. 2003;96(12):893-898. 

QUESTION Do any clinical predictors help identify 
patients with undifferentiated chest pain who are having 
an acute coronary syndrome (ACS)? 

DESIGN Prospective, consecutive patients meeting 
study criteria, with data collected independently of out¬ 
come. 

SETTING British emergency department. 

PATIENTS During a 15-month period, data were col¬ 
lected as part of a randomized trial comparing a chest 
pain unit to usual care for patients with chest pain. 
Patients were excluded if they had electrocardiogram 
(ECG) evidence of an ACS, known coronary heart disease 
with prolonged or recurrent chest pain typical of their 
angina, comorbid conditions or an alternate problem that 
required admission (eg, heart failure, pulmonary embo¬ 
lus), or an obvious noncardiac cause of chest discomfort 
(eg, chest wall pain reproduced by palpation in patients 
with no or few risk factors for coronary heart disease). 

Of the 6957 patients potentially eligible, 764 (11%) had 
an abnormal ECG result suggesting acute ischemia, 2402 
(34.5%) had known coronary heart disease with pro¬ 
longed or recurrent angina, 869 (12%) had comorbidities 
or other pathology requiring admission, and 1291 had 
obvious noncardiac chest pain (19%). This left 1631 
(23%) with undifferentiated chest pain; 972 agreed to par¬ 
ticipate in the trial. 


1.8), appeared useful, with their diagnostic OR significantly 
different from 1, but the variables had no independent value 
in a multivariable model for predicting MI. 

For diagnosing an ACS, the only significant variables in a 
multivariable model were pain radiating to the shoulder, left 
arm, or to both arms and exertional pain ( ble 35-1 ). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS A large number of variables were analyzed 
compared with the WHO standard for diagnosing an MI. 
The data allow some insight into how the performance of 
symptoms might change when troponin is considered as part 
of the case definition. 

LIMITATIONS The data were timely when collected, but the 
case definition for AMI changed during the course of the 
study. The incident rate of MI would have increased 30% had 
troponin been included in the case definition for AMI. Most 
of the patients with ACS did have an AMI according to new 
standards, but 12% had something other than an AMI at pre¬ 
sentation. Verification bias exists in that only patients admit¬ 
ted to the chest pain unit were included. 

The study subjects were patients whose diagnosis was not 
obvious initially, leading to admission to the chest pain unit. 
Thus, the population selected is the most appropriate for 
answering the study’s questions because the individual clini¬ 
cal symptoms were not used to identify patients for enroll¬ 
ment. 

The change in case definition during the conduct of this 
study affords us the ability to make some inferences about 
the effect on the utility of clinical symptoms. The 2 variables 
that were independently important in a multivariate analysis 
appear less important when troponins are included in the 
case definition. Pain radiation to the shoulder or both arms 
has a diagnostic OR that decreases from 6.0 to 2.4 with cur¬ 
rent WHO diagnostic standards for MI; the diagnostic OR 
for exertional pain decreases from 3.1 to 2.5. The results need 
confirmation in a group of patients diagnosed with current 
standards. 

Because of the changing definition, the data in these 
authors’ second cohort provide more valid estimates of the 
likelihood ratios. 1 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Goodacre SW, Angelini K, Arnold J, Revill S, Morris F. Clinical predic¬ 
tors of acute coronary syndromes in patients with undifferentiated chest 
pain. Q / Med. 2003;96(12):893-898. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A nurse with expertise in chest pain was trained to record 
symptoms from a list of a priori determined variables. 

MAIN OUTCOME MEASURES 

An ACS was defined by a troponin T level increased at 2-day fol¬ 
low-up, or a variety of prespecified events during the subsequent 
30 days: cardiac death, nonfatal myocardial infarction (MI) 
using the current troponin-based standards, 1 new heart failure, 
life-threatening arrhythmia, or coronary revascularization. 

MAIN RESULTS 

An ACS was found in 77 patients (7.9%). Of those, 70 
patients qualified because of an increased troponin level sug¬ 
gesting ischemia. 
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Table 35-15 Likelihood Ratios for Symptoms Found as Useful in a 
Multivariable Model 

Symptom 

LR+ (95% Cl) 

LR- (95% Cl) 

Radiation to right arm 

3.8 (2.2-6.6) 

0.86 (0.77-0.96) 

Vomiting 

3.5 (2.0-6.2) 

0.87 (0.79-0.97) 

Ex-smoker 

2.5 (1.6-4.0) 

0.85 (0.76-0.96) 

Indigestion/burning pain 

2.3 (1.5-3.5) 

0.85 (0.74-0.96) 

Male sex 

1.5 (1.3-1.6) 

0.24(0.12-0.48) 

Current smoker 

1.4 (1.0-1.8) 

0.83 (0.68-1.0) 

Radiation to left arm 

1.3 (0.93-1.8) 

0.90 (0.76-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

The variables in ble 35-15 represent the unadjusted likeli¬ 
hood ratios for those found significant in a multivariate model. 
Variables that were significant by themselves but had no inde¬ 
pendent value in a multivariate model (diagnostic odds ratio 
not significantly different from 1) included pain radiation to the 
neck or jaw, aching/dull/heavy quality to pain, gripping/crush- 
ing quality to pain, right-sided chest pain, left-sided chest pain, 
chest wall tenderness, and diaphoresis. Diabetes, hypertension, 
hyperlipidemia, and a family history of coronary heart disease 
all had P > .50 and were not tested in the multivariate model. 

A multivariate model was developed using the indepen¬ 
dent predictors: 

MI score = 116 + 1.0 x (age) + 23 x (male) + 21 x (right arm 
pain) + 18 x (ex-smoker) + 11 x (left arm pain) + 15 x (vom¬ 
it) + 15 x (smokes) + 10 x (burning pain) 

(Male = 1, female = 0. If symptom present, substitute 1; if 
negative or unknown, substitute 0.) 

MI probability = [exp (score/11) ]/[l + exp (score,11> ] 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS These data were prospectively collected from 
chest pain patients who did not intially have an obvious diag¬ 
nosis. Current standards for diagnosis with troponin were 
used, and 91% of the patients with ACSs had ischemia. 

LIMITATIONS It is impossible to know whether the findings 
would work better or worse had the results been reported for 
all patients with chest discomfort. Most excluded patients 
were excluded for findings not related to the clinical symp¬ 
toms (eg, an abnormal ECG result, previous diagnosis of cor¬ 
onary heart disease with prolonged pain). 

The authors observed that their population was younger 
(average age, 50 years) than most populations of patients 
with chest pain. 

These data are important in that they used current stan¬ 
dards of diagnosis with cardiac troponin levels. Clinicians 
must understand the study population before using the 
results—these patients had an uncertain diagnosis because 


those with an abnormal ECG or with prolonged or recurrent 
chest pain typical of diagnosis before their previous angina 
were excluded. In addition, patients with obvious noncardiac 
chest pain or those requiring admission independent of their 
ACS were excluded. After these exclusions, the remaining 
patients were those for whom the clinician might be most 
reliant on the clinical symptoms, representing a common 
problem for emergency department physicians. 

The data for indigestion/burning pain are counterintuitive 
in suggesting that the finding increases the likelihood for an 
acute MI. Astute clinicians will recognize that the study popu¬ 
lation did not include patients discharged from the emergency 
department who presented with indigestion/burning as the 
primary symptom, despite less important chest discomfort 
associated with their indigestion. Thus, the sensitivity and 
specificity of indigestion/burning pain might be quite different 
among all patients presenting with chest discomfort. In addi¬ 
tion, indigestion/burning might have been a referral filter 
applied at the patient level in that patients presenting to the 
emergency department with a burning/indigestion type of 
pain likely represented those who could have self-medicated 
without relief or those with exceptionally severe discomfort. 

The relative lack of importance for left arm pain radiation in 
comparison to right arm radiation also seems counterintuitive. 
Of the total patient population, there was no pain radiation in 
38% and radiation to the left arm in 27%; only 6% of patients 
had right arm pain radiation. Most clinicians consider left arm 
pain radiation as a feature that suggests chest pain of cardiac ori¬ 
gin. Patients may recognize left arm pain radiation as suggestive 
of an MI, making those experiencing any left arm pain more 
likely to come to the emergency department even when cardiac 
ischemia is an unlikely diagnosis (eg, musculoskeletal pain or 
cervical pain radiating to the left arm). There are 2 other possi¬ 
ble explanations for the lesser importance of left arm pain in this 
population. First, it is possible that left arm pain occurs even 
more frequently in patients with obvious ACSs associated with 
ECG changes (these patients were excluded from this study). 
Second, in other studies, most patients with right arm pain also 
had bilateral arm pain radiation that would make left arm pain 
alone appear less important. Once the importance of left arm 
pain is used to identify patients with possible ACS, the presence 
of left arm pain may no longer be independently useful in iden¬ 
tifying those with MI vs those without MI. 

Acknowledgment 

Steven Goodacre kindly provided the results of the multivariate 
model and information about the clinical exclusion of patients 
with an obvious noncardiac cause for chest discomfort. 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Alpert JS, Thygesen K, Antman E, Bassand JP. Myocardial infarction 
redefined—a consensus document of the Joint European Society of Car¬ 
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of myocardial infarction. / Am Coll Cardiol. 2000;36(3):959-969. 
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TITLE Using Patient-Reportable Clinical History Factors 
to Predict Myocardial Infarction. 

AUTHORS Wang SJ, Ohno-Machado L, Fraser HSF, 
Kennedy RL. 

CITATION Comp Biol Med. 2001;31(1):1-13. 

QUESTION Using only clinical factors, without electro¬ 
cardiogram (ECG) data, can a logistic model be created 
that predicts myocardial infarction (MI)? 

DESIGN The variables identified in a previous study 1 
were collected in 2 patient populations. The details of 
whether the study included prospective, consecutive 
patients who had an independently applied reference 
standard are not provided. 

SETTING Two British hospitals. The logistic model was 
developed on patient data from one hospital and then 
tested on patients from a second hospital. 

PATIENTS The patient population consisted of patients 
with an MI prevalence of 22% in the first hospital and 
31% in the hospital where the model was verified. Pre¬ 
sumably, the clinicians suspected that all these patients 
had acute cardiac ischemia, but details about the patient 
population are not specified. 

MAIN OUTCOME MEASURE 

Accuracy (c-index) of the logistic model when evaluated in 
the validation sample from the second hospital. 

MAIN RESULTS 

The model had an accuracy of 84% in the validation set. 

MI score = -92 + 1.0 x (age) + 17 x (diaphoresis) + 14 x (nau¬ 
sea) + 11 x (smokes) + 11 x (left arm pain) + 8 x (male) - 44 
x (pleuritic pain) - 30 (episodic pain) - 15 x (sharp pain) - 15 
x (previous angina) - 12 (previous MI) 

(If symptom present, substitute 1; if symptom absent, sub¬ 
stitute 0.) 

MI probability = [exp (score,15) ]/[l + exp (score,15> ] 

An expert cardiologist picked the variables anticipated to 
be important in the logistic model. The variables identified 
by the cardiologist as important, but that were not indepen¬ 
dently valuable in a multivariable model included diabetes, 
hyperlipidemia, severe chest pain quality, retrosternal pain, 
left chest pain location, postural pain, pain that worsened, 
and pain that was worse than previous angina. The variables 
identified in variable selection by the computer that were not 
selected by the cardiologist were a sharp quality to the pain 
and the presence of nausea. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS These data could be important when they are 
applied to the appropriate patient population before the 
ECG is obtained. 

LIMITATIONS There is no description of how the study 
population was obtained and the disease status verified. 
However, the incidence of MI suggests that the study 
included all patients with chest pain in the emergency 
department. 

The data support the commonly held notion that chest 
pain with diaphoresis or nausea, especially when radiating to 
the left arm or in a smoker, increases the probability that the 
patient is having a MI. 

One of the important findings in the study was the com¬ 
parison of variables selected by the cardiologist to those 
remaining in the final model because of their statistical sig¬ 
nificance. The differences between variables selected by the 
cardiologist vs the computer highlight findings that might be 
inappropriately overweighted or underweighted by clini¬ 
cians. 

After age, the findings of diaphoresis, nausea, and left arm 
pain are the variables that increased the probability of MI the 
most. The variables that decrease the likelihood the most are 
pleuritic type pain and episodic pain. These data should be 
applied to patients who have not had ECGs. Thus, the model 
could be used at triage of the patient, but it requires valida¬ 
tion in an appropriate population with current diagnostic 
standards. 

The finding that a previous MI or angina decreases the 
probability of a current MI seems counterintuitive. However, 
if patients with a history of ischemic heart disease are more 
likely to use emergency services for any given episode of pain, 
then the proportion of visits for a new MI might be less than 
in those with no history. 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Kennedy RL, Garrison RF, Burton AM, et al. An artificial neural network 
system for diagnosis of acute myocardial infarction (AMI) in the acci¬ 
dent and emergency department: evaluation and comparison with 
serum myoglobin measurements. Comput Methods Programs Biomed. 
1997;52 (2):93-103. 
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CASE 1 You recommend screening densitometry to a 
healthy 64-year-old woman. She will have to drive 1 hour 
to the nearest testing center, and she does not believe that 
she needs the test. To further assess her risk, you note that 
she weighs 49 kg (108 lb). What can you tell this patient 
about her probability of osteoporosis? 

CASE 2 A frail, 79-year-old woman is admitted to the 
hospital with a diverticular bleeding event. On examina¬ 
tion, you observe that she has significant kyphosis. When 
she stands upright against a wall, she cannot touch the 
back of her head to the wall. You wonder whether she has 
vertebral fractures. 

CASE 3 A 58-year-old woman presents for her annual 
examination. She experienced physiologic menopause 8 
years ago but is asymptomatic and has no other risk fac¬ 
tors for osteoporosis. On examination, you note that her 
rib-pelvis distance is 1 hngerbreadth. She tells you that 
she has developed a humped back. Should this patient be 
referred for densitometry? 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


Osteoporosis causes 1.5 million fractures per year in the 
United States. 1 As the population continues to age, this num¬ 
ber is expected to double by 2040. 2 Half of all postmeno¬ 
pausal women and 15% of white men older than 50 years will 
have an osteoporosis-related fracture in their lifetime, with 
15% of those occurring in the hip. Pain, loss of indepen¬ 
dence, impaired ambulation, depression, and nursing home 
admission are common sequelae. 3 ' 8 

In 1995, health care spending for osteoporotic fractures in 
the United States was $13.8 billion and is estimated to be $31 
billion to $62 billion by 2020. 9 The US Preventive Services 
Task Force recommends that women 65 years of age or older 
be screened routinely for osteoporosis and women younger 
than 65 years be screened if they have risk factors. 10 There are 
no current guidelines on when to screen healthy perimeno- 
pausal women, and few to no risk factors identified for men. 

The physical examination may assist clinicians in prevent¬ 
ing osteoporotic fractures in several ways. First, it may iden¬ 
tify patients with low bone mineral density (BMD), in whom 
routine screening is not currently recommended or has not 
been completed. It may also identify patients at low risk of 
osteoporosis, in whom BMD testing is unnecessary. Although 
it is an imperfect indicator of fracture risk, BMD measure¬ 
ment is widely used both in randomized controlled trials and 
in clinical practice as the primary criterion for initiating 
osteoporosis therapies. 
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Second, the physical examination could identify patients with 
occult vertebral fracture. Two-thirds of vertebral fractures are 
clinically silent but are associated with a 2- to 3-fold increased 
risk of further fractures. Several osteoporosis therapies reduce 
the risk of further fractures in women with vertebral fractures, 
and the National Osteoporosis Foundation algorithm suggests 
that patients found to have vertebral fracture should be treated 
regardless of their BMD measurement. 11 Thus, the objective of 
this review was to identify clinical examination findings that 
improve the identification of patients with low BMD or occult 
vertebral fractures who would benefit from therapy or in whom 
further screening with BMD testing is unnecessary. 

Case Definitions and Pathophysiology 

Osteoporosis is a skeletal disorder characterized by compro¬ 
mised bone strength, predisposing a person to an increased 
risk of fracture. For this review, we used the World Health 
Organization’s definition of osteoporosis, based on BMD that 
compares a patient’s density to normative values for a popula¬ 
tion of 20- to 40-year-olds in terms of the number of devia¬ 
tions from the mean value. Osteoporotic bones have a density 
that is more than 2.5 SD below the mean (T score < -2.5). 
Osteopenic bones have a T score that is between -2.5 and -1. 
Normal bones have a BMD T score of-1 or higher. 12 

Vertebral fractures are compression deformities that 
reduce vertebral body height by 20% or more on imaging 
studies; most of the articles included in this review used a 
semiquantitative technique to diagnose vertebral fractures on 
plain lateral radiographs of the spine. Spinal fractures are 
classified by the maximal percentage of vertebral body height 
loss as follows: grade 1, 20% to 24%; grade 2, 25% to 39%; 
and grade 3,40% or more. 13 

The prevalence of osteoporosis in large population-based 
studies allows an estimation of the pretest probability in 
women of various ages. The prevalence of BMD-defined osteo¬ 
porosis at the spine, wrist, or hip in white women in the United 
States by decade is as follows: for aged 50 to 59 years, 15%; 60 to 
69 years, 22%; 70 to 79 years, 38%; and 80 years or older, 70%. 14 


Table 36-1 Prevalence of Vertebral Deformities in Women Aged 50 
Years or Older 17 


Vertebral Deformity, % a 

Age, y 

> Grade 1 

> Grade 2 

50-54 

10 

4.7 

55-59 

12 

6.6 

60-64 

12 

8.9 

65-69 

17 

12 

70-74 

30 

21 

75-79 

33 

29 

80-84 

56 

49 

85-89 

49 

47 

>90 

75 

75 


“Grade 1 or greater is equal to 20% or more vertebral body height loss; grade 2 or 
greater is equal to 25% or more vertebral body height loss. 


For nonwhite women older than 50 years, the prevalence of 
BMD-defined osteoporosis in the Third National Health and 
Nutrition Examination Survey was reported as follows: non- 
Hispanic black women, 12%; Mexican Americans, 19%; and 
women in other ethnic groups, 28%. 15 In special populations, 
the prevalence of osteoporosis can be much higher. For exam¬ 
ple, in residents of skilled nursing facilities who are older than 
75 years, the prevalence of osteoporosis exceeds 50% for all the 
residents, regardless of race and sex. 16 

Occult vertebral fractures are also common and increase 
with age (Table 36-1). Grade 2 vertebral deformities are found 
in 6.6% of women aged 55 to 59 years and in 49% of women 
aged 80 to 84 years. 17 Clinical characteristics or historical 
items that might increase a clinician’s pretest probability of 
osteoporosis or vertebral fracture include older age, low activ¬ 
ity level, family history, hypogonadism (men), and exposure 
to glucocorticoids and alcohol. The pretest probability thresh¬ 
old for testing BMD depends on the anticipated benefit of 
treatment for an individual patient and the patient’s desire for 
treatment. 

The pathophysiology of osteoporosis is related to physical 
examination findings in several ways. The loading or mechani¬ 
cal forces on bone tend to increase bone formation and bone 
mass through osteoblast stimulation. Thus, increasing body 
weight and muscle strength are inversely related to osteoporo¬ 
sis. Type I collagen is a major constituent of both bone and 
skin that is reduced with advancing age and low estrogen 
levels. 18 ' 20 Skinfold thickness may therefore reflect skeletal col¬ 
lagen content. Similarly, tooth loss is influenced by mandibu¬ 
lar alveolar bone quality and may provide an easily observed 
marker of bone health in the rest of the skeleton. 

The sequelae of clinically occult vertebral fractures can also 
lead to physical examination findings that may become appar¬ 
ent before a symptomatic fracture occurs. Height loss resulting 
from vertebral compression fractures can be measured in the 
clinic over time or with the patient’s recalled maximal adult 
height. Vertebral fractures affect height but not arm span, so 
arm span-height differentials may identify individuals with 
occult vertebral fractures. 21 Thoracic kyphosis can result from 
anterior compression fractures in the thoracic spine (“dowa¬ 
ger’s hump”). Kyphosis can be measured on physical examina¬ 
tion with a curved ruler such as an architect’s rule or by 
measuring the wall-occiput distance. The wall-occiput dis¬ 
tance describes the difference between the wall and the 
patient’s occiput when he or she stands straight with heels and 
back against the wall. Lumbar fractures also result in decreased 
rib-pelvis distance that can be measured in fingerbreadths on 
examination. 

How to Elicit the Relevant Signs 

Data for several physical examination signs are included in 
this review. Weight and height are routinely measured in the 
clinical setting. Aside from clinic notes, height change can 
be documented from alternate sources (such as a driver’s 
license) or from the patient’s memory of height at age 25 
years. 22 ' 24 Several studies have shown good to excellent corre¬ 
lation between elderly patients’ recalled maximal height and 
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previous health records. 25-27 A stadiometer (an upright bar 
marked with a height scale with a sliding notch to designate 
height) is the most accurate method of height measurement. 

Arm span-height differential is determined by subtracting 
a patient’s height in centimeters from the arm span in centi¬ 
meters measured with arms at a 90-degree angle from the 
trunk. The arm span is the distance between the tips of the 
middle fingers while the patient faces forward with the arms 
fully extended and palms facing forward. 

Measurements of thoracic kyphosis can be made indirectly 
on radiographs but can also be directly measured by applying 
an architect’s semiflexible rule, called a flexicurve, to the 
patient’s back. 28 The flexicurve is a device that can be bent in 
1 plane only and retains its shape after application to the cur¬ 
vature of the back between the C7 spinous process and S2 
spinous process. The outline is traced on paper, and the max¬ 
imal angle is measured with calipers or a ruler. 29 The kypho¬ 
sis index is the ratio of thoracic curvature to the length of the 
upper back and is calculated as 100 times the maximum hor¬ 
izontal distance divided by the vertical length of the upper 
back curve. Flexicurve measurements, although painless, 
inexpensive, and safe, are time consuming. 30 


Another measure that quantitates the degree of kyphosis is 
wall-occiput distance. It is measured while the patient stands 
straight with his or her back against the wall and heels touch¬ 
ing the wall (Figure 36-1). While the head faces forward so 
that an imaginary line connecting the lateral corner of the 
eye to the superior junction of the auricle of the ear is parallel 
to the floor, the distance between the occipital prominence 
and the wall is quantified with a tape measure. 31 For the pur¬ 
pose of this review, the inability to touch the wall with the 
back of the head is a positive finding. 

Rib-pelvis distance is a measure of lumbar fracture. The 
patient stands erect with arms outstretched at 90 degrees. 
The examiner stands behind the patient and inserts his or her 
fingers into the space between the inferior margin of the ribs 
and the superior surface of the pelvis in the midaxillary line. 
The rib-pelvis distance is the closest whole number of finger- 
breadths between these structures. 32 

Skinfold thickness is measured at the back of the hand with 
calipers. 18-20 The back of the hand is a convenient site for 
measurement in the clinic. 19 The fourth metacarpal longitu¬ 
dinal fold site was used in the studies of skinfold thickness 
included in this review. 



Figure 36-1 Physical Examination Tests for Detection of Occult Vertebral Fractures 

A, Wall-occiput test is used to detect occult thoracic vertebral fractures. A positive test result in this review is defined as being unable to touch the wall with 
the occiput when standing with the back and heels against the wall and the head positioned such that an imaginary line from the lateral corner of the eye 
to the superior junction of the auricle is parallel to the floor. B, Rib-pelvis distance test is used to detect occult lumbar vertebral fractures. A positive test 
result is defined as a distance of less than or equal to 2 fingerbreadths between the inferior margin of the ribs and the superior surface of the pelvis in the 
midaxillary line. 
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Hand grip strength is measured using a small hydraulic 
hand grip or isometric dynamometer and is defined as the 
maximal force recorded while the patient squeezes the device 
with arms straight to the side. 33,34 


METHODS 

We searched MEDLINE for articles from 1966 through 
August 2004, with a search strategy similar to that used by 
other authors in this series. 35 We used several National 
Library of Medicine Medical Subject Headings to encompass 
osteopenia, osteoporosis, and spinal fracture disease states: 
“exp osteoporosis,” “exp spinal fracture,” “exp metabolic 
bone disease” (for osteopenia), and “exp bone density.” The 
MEDLINE search was supplemented with a manual review 
of the bibliographies of all identified articles, additional 
review articles including recent osteoporosis guidelines, 4 
clinical skills textbooks, 36 ' 39 and contact with experts in the 
field. Two authors (A.D.G. and M.T.D.) independently exe¬ 
cuted the MEDLINE search strategy and reviewed titles and 
abstracts from the search results. Two authors (A.D.G. and 
C.S.C.-E.) then independently reviewed and extracted data 
from articles or abstracts identified as relevant. We contacted 
authors for original data when articles reported data on the 
precision of signs in diagnosing osteoporosis or spinal frac¬ 
ture but did not include enough information to calculate 
likelihood ratios (LRs). 

We included studies in our review if they included original 
data on the accuracy or precision of the medical history or 
physical examination in diagnosing osteoporosis, osteopenia, 
or spinal fracture. We required that the gold standard com¬ 
parison for the clinical examination parameters be bone den¬ 
sitometry at any site or documented vertebral fracture using 
either a semiquantitative technique or vertebral morphome¬ 
try. When BMD values were reported directly, the corre¬ 
sponding T score was obtained with sex-appropriate tables 
provided by the manufacturer of the densitometer used in 
the study. Articles were excluded if they contained insuffi¬ 
cient data to allow calculation of LRs. We included in our 
tables and results only the physical examination parameters 
that are feasible to perform in a clinical setting. 

Quality Assessment of Included Articles 

Two authors (A.D.G. and C.S.C.-E.) independently assessed 
the methodologic quality of included articles using criteria 
adapted from other authors in this series. 40 Level 1 evidence 
classifies articles that were independent (neither the test result 
nor the gold standard result was used to select patients for the 
study), studied consecutive patients representative of a popu¬ 
lation for which the test is likely to be used, were blinded, 
measured the gold standard (BMD measurement or docu¬ 
mented fracture) in all patients, and included at least 100 
study participants. Level 2 evidence met criteria for level 1 
evidence, but fewer than 100 patients were studied. Level 3 
evidence was the same as level 2 evidence, but the population 
was nonconsecutive or nonrepresentative. Studies of lower 


levels of evidence were excluded. Disagreements were resolved 
by discussion and consensus. 

Data Analysis 

We used raw data from reported studies that met our inclu¬ 
sion criteria to calculate values and 95% confidence inter¬ 
vals for sensitivity, specificity, and positive likelihood ratio 
(LR+) and negative likelihood ratio (LR-), using SAS statis¬ 
tical software, version 8.0 (SAS Institute Inc, Cary, North 
Carolina). 

RESULTS 

Study Characteristics 

We identified 246 articles with our search strategy and an 
additional 79 from reference lists and expert consultation. 
Fourteen studies met inclusion criteria and were identified 
for final review (Tables 36-2 and 36-3). 

Precision 

Table 36-4 lists reported precision estimates for the physical 
examination maneuvers. Interrater reliability was not reported 
for studies of height and weight included in this review. Dif¬ 
ferences in sensitivity and specificity for the same maneuver 
across different studies could be related to examiner differ¬ 
ences that were not reported. 

Diagnostic Accuracy 

The most clinically relevant cut points and their associated 
LRs for the physical examination maneuvers are listed in 
Table 36-5 for osteoporosis and Table 36-6 for vertebral frac¬ 
ture. In general, the patient populations were women, with 
most patients from osteoporosis clinics or older than 65 
years. Translating these results to younger women might 
yield error that is difficult to quantify. Because many of the 
examination findings may be measuring similar or identical 
physiologic phenomena, we do not recommend using the 
LRs in series. 

For postmenopausal women, prediction rules using osteo¬ 
porosis risk factors, such as the Simple Calculated Osteopo¬ 
rosis Risk Estimation 41 or the Osteoporosis Risk Assessment 
Instrument, 42 have some predictive value in selected popula¬ 
tions (Table 36-7). n ’ 43 ' 46 Variables included in these predic¬ 
tion rules include age, weight, and race, which overlap with 
the clinical examination. An exhaustive review of prediction 
rules for the diagnosis of osteoporosis or fracture was not 
attempted in this study because reviews already exist in the 
literature. 42 Although the LR+ of the prediction rules is not 
clinically informative (1.2-1.7), the LR- is far superior to 
the physical examination maneuvers listed here (0.02-0.3), 
making prediction rules much more useful for ruling out 
osteoporosis or fracture. Thus, clinical prediction rules are 
the most useful means of identifying women who are at low 
risk of fracture, in whom BMD screening can safely be 
deferred. 
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Table 36-2 Studies Used to Determine the Accuracy of Clinical Examination for Diagnosing Osteoporosis 



Source 

Setting and Country 

Methodologic 

Quality 3 

Prevalence of 

Osteoporosis, % Inclusion Criteria 

No. of 
Patients 

Mean 
Age, y 

Diagnosis Used 

Height Loss 

San i la et al 23 

Outpatients, Finland 

Level 3 

34 

Women aged 55-70 y with 
rheumatoid arthritis, able to 
walk 

61 

62 

BMD-diagnosed osteopo¬ 
rosis lumbar BMD < 0.9 
on Lunar machine 

Dargent-Molina 
et al 47 

Volunteers for prospec¬ 
tive, multicenter trial, 
France (EPIDOS) 

Level 1 

50 

White women aged > 75 y, 
general population, without 
past fractures 

4638 

80 

BMD-diagnosed osteopo¬ 
rosis T score < -3.5 SD 

Weight 

Michaelsson et al 44 

Outpatients, Sweden 

Level 1 

4 

Random sample of women 
aged 28-74 y, no exclusions 

175 

51 

BMD-diagnosed osteopo¬ 
rosis T score < -2.5 

Dargent-Molina 
et al 47 

Volunteers for prospec¬ 
tive, multicenter trial, 
France (EPIDOS) 

Level 1 

50 

White women aged > 75 y, 
general population, without 
past fractures 

4638 

80 

BMD-diagnosed osteopo¬ 
rosis T score < -3.5 SD 

Bedogni et al 51 

Community, Italy 

Level 1 

8 

Women aged > 18 y without 
disease 

1873 

Not 

reported 

(range, 

49-77) 

BMD-diagnosed osteopo¬ 
rosis T score < -2.5 

Kyphosis 

Ettinger et al 28 

Outpatients, California 

Level 1 

10 

Consecutive sample of 
women aged 65-91 y 

610 

73 (range, 
72-91) 

BMD-diagnosed osteopo¬ 
rosis T score < -2.5 

Self-reported Humped Back 

Kantoret al 53 

Outpatients, Ohio 

Level 1 

10 

White women aged > 18 y 
referred for bone density 
scan 

2577 

60 

BMD-diagnosed osteopo¬ 
rosis at the hip T score 
<-2.5 

Grip Strength 

Di Monaco et al 33 

Outpatients, Italy 

Level 3 

34 

Consecutive postmeno¬ 
pausal, white female volun¬ 
teers 

102 

63 

BMD-diagnosed osteopo¬ 
rosis T score < -2.5 

Foley et al 34 

Outpatients, Ohio 

Level 1 

18 

Older, independent adults in 
the community 

73 

71 

BMD-diagnosed osteopo¬ 
rosis T score < -2.5 

Dargent-Molina 
et al 47 

Volunteers for prospec¬ 
tive, multicenter trial, 
France (EPIDOS) 

Level 1 

50 

White women aged > 75 y, 
general population, without 
past fractures 

4638 

80 

BMD-diagnosed osteopo¬ 
rosis T score < -3.5 SD 

Hand Skinfold 

Orme and 

Belchetz 20 

Outpatients, California 

Level 3 

63 

Consecutive women in 
osteoporosis clinic 

225 

59 

BMD-diagnosed osteopo¬ 
rosis T score < -2.0 

Tooth Count 


Earnshaw et al 60 Outpatients in multi- Level 1 33 White postmenopausal 1365 53 BMD-diagnosed osteopo- 

center alendronate women aged 45-59 y rosis T score < -2.5 

trial, United Kingdom, 

United States, and 
Denmark 


Inagaki et al 64 

Outpatients, Japan 

Level 1 

11.5 

Community women 

190 Not 

BMD-diagnosed osteopo- 






reported 

rosis quartiles of BMD 






(range, 

reported according to 






31-79) 

aluminum standard 
(results calculated in cur¬ 
rent report using lowest 
quartile) 


Abbreviations: BMD, bone mineral density; EPIDOS, the European Patient Information and Document Service. 
“See Table 1 -7 for a description of Evidence Levels. 
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Height Loss 

Three studies of postmenopausal women using recalled 
heights found an association between height loss and vertebral 
fractures, with 2 of the studies including enough data to calcu¬ 
late LRs (Table 36-5). 23,47,48 In the first study, a height loss of 
more than 3 cm was useful in classifying patients with and 
without low BMD (LR+, 3.2; LR-, 0.4). 23 However, the study 
population was nonconsecutive female patients with rheuma¬ 
toid arthritis. In a study of women in the general population, 
Dargent-Molina et al 47 did not find a strong association 
between height loss of more than 3 cm and osteoporosis (LR+, 


1.1; LR-, 0.6). The third study, based on 13732 women in the 
Fracture Intervention Trial, reported that a self-reported 
height loss greater than 4 cm since age 25 years was associated 
with an odds ratio (OR) of 2.8 for vertebral fractures. 48 Thus, 
although height loss is a potentially useful examination tool, 
the generalizability of this measure is uncertain. 

Arm Span-Height Difference 

Versluis et al 21 reported that with age, height declined at twice 
the rate of arm span. The mean difference in arm span and 
height was 1.4 cm in women aged 55 to 59 years and 


Table 36-3 Studies Used to Determine the Accuracy of Clinical Examination for Diagnosing Spinal Fracture 



Source 

Setting and 
Country 

Methodologic 

Quality 

Prevalence of 

Fracture, % Inclusion Criteria 

No. of 
Patients 

Mean Age, y 

Diagnosis Used 

Arm Span-Height Difference 

Versluis et al 21 

General practice, 
The Netherlands 

Level 1 

3.4 If aged 55- White women, aged 55-84 y, 
59 y, 21.9 if healthy, in general practices 

aged 80-84 y 

449 

67.6 

Vertebral fractures by 
morphometry 


Wangetal 50 Osteoporosis Level 1 26 in men, not White male and female 480 63 for Vertebral fractures by 

clinic, Australia reported in healthy volunteers aged 18- women, morphometry 





women 

92 y compared with consecu¬ 
tive osteoporosis clinic 
patients aged 45-90 y 


66 for men 


Wall-Occiput Distance 

Siminoski et al 31 

Outpatients, 

Canada 

Abstract only 

29 

Women aged > 18 y referred 
to osteoporosis clinic 

216 

53 

Thoracic vertebral 
fractures by mor¬ 
phometry 

Rib-Pelvis Distance 

Siminoski et al 32 

Outpatients, 

Canada 

Level 1 

14 

Consecutive women in osteo¬ 
porosis clinic 

781 

56.8 

Lumbar vertebral 
fractures by mor¬ 
phometry 


Table 36-4 Precision Data Reported in the Studies Used in the Review 

Source 

Precision Estimate Used Precision Estimate (95% Cl) 

Height Loss 

San i la et al 23 

Coefficient of repeatability 

2.3 cm For height, with range of 0.4 mm to 0.5 cm for tape measure positions 

Arm Span-Height Difference 

Versluis et al 21 

Intraobserver mean differences 

Height, 1 mm (-2.3 to 2.5 mm); arm span, 0.9 mm (-2.5 to 4.3 mm) 


Interobserver mean differences 

Height, -1.6 mm (-3.2 to 0.1 mm); arm span, -4.6 mm (-7.7 to -1.5 mm) 

Kyphosis 

Ettinger et al 28 

Coefficient of variation (2 independent technicians) 13% For kyphosis index 

Wall-Occiput Distance 

Siminoski et al 31 

Not reported 

Single examiner measured 3 times 

Rib-Pelvis Distance 

Di Monaco et al 33 

Interobserver k 

k = 0.87 For cutoff of 2 finger breadths or less 

Grip Strength 

Di Monaco et al 33 

Coefficient of variation 

3% 

Hand Skinfold 

Orme and Belchetz 20 

Mean of 3 measurements 

Reproducible to within 0.2 mm 


Abbreviation: Cl, confidence interval. 

































CHAPTER 36 Osteoporosis 


increased to 3.2 cm in women aged 80 to 84 years. Finding an 
arm span-height difference of 5 cm or greater yielded an 
LR+ of 1.6 and an LR- of 0.8 for spinal fracture based on 
these data (Table 36-6). Verhaar et al 49 reported that an arm 
span-height difference cutoff of 3 cm resulted in a sensitivity 
of 58% and a specificity of 56% for BMD-diagnosed osteo¬ 
porosis, for an LR+ of 1.3. Wang et al 50 found no association 
between arm span and vertebral fractures in both men and 
women (LR+ for men, 1.0; LR+ for women, 0.9). We con¬ 


clude that the arm span-height difference does not predict 
vertebral deformities or BMD-diagnosed osteoporosis. 

Weight 

For women, the relationship between both low weight and 
body mass index (BMI) and osteoporosis has been consis¬ 
tently reported. 44 In cohort studies examining clinical risk 
factors in women, weight lower than 70 kg (154 lb) is the sin- 


Table 36-5 Clinical Signs and Symptoms in the Diagnosis of Osteoporosis 


Source 

Cutoff Values 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 


Height Loss 

Dargent-Molina et al 47 

>3 cm 

92 

13 

1.1 (1.0-1.1) 

0.6 (0.4-0.9) 


Sanila et al 23 

>3 cm 

68 

72 

3.2(1.7-6.1) 

0.4 (0.2-0.7) 


Weight 

Dargent-Molina et al 47 

<60 kg 

82 

56 

1.9(1.8-2.0) 

0.3 (0.3-0.4) 


Bedogni et al 51 

<51 kg 

22 

97 

7.3(5.0-10.8) 

0.8 (0.7-0.9) 


Michaelsson et al 44 

<60 kg 



3.6 (2.2-56) 




60-70 kg 



0.3(0-19) 




>70 kg 



0.2 (0-2.5) 



Kyphosis 

Ettinger et al 28 


25 

92 

3.1 (1.8-5.3) 

0.8(07-1.0) 


Self-reported Humped Back 

Kantor et al 53 


20.6 

97 

3.0 (2.2-4.1) 

0.85 (0.8-0.9) 


Grip Strength 

Foley et al 34 

<40 lb 

31 

88 

2.6 (0.9-7.5) 

0.8 (0.5-1.1) 



<60 lb 

91 

27 

1.3 (1.0-1.6) 

0.3 (0.1-2.2) 


Dargent-Molina et al 47 

<59 kPa 

84 

27 

1.2 (1.1-1.2) 

0.6 (0.5-0.7) 



<44 kPa 

41 

76 

1.7 (1.5-1.9) 

0.8 (0.7-0.9) 


Di Monaco et al 33 

<20 kg 

88 

41 

1.5(1.0-2.1) 

0.3 (0.1-0.6) 


Hand Skinfold 

Orme and Belchetz 20 

<2.1 mm 

93 

20 

1.2 (1.0-1.3) 

0.4 (0.2-0.8) 


Tooth Count 

Earnshaw et al 60 

<22 teeth 

30 

70 

1.0 (0.8-1.2) 

1.0 (0.9-1.1) 


Inagaki et al 64 

<20 teeth 

27 

92 

3.4 (1.4-8.0) 

0.8 (0.6-1.0) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 


Table 36-6 Clinical Signs and Symptoms in the Diagnosis of Spinal Fracture 


Source 

Cutoff Values 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 


Arm Span-Height Difference 

Versluis et al 21 

>5 cm 

39 

76 

1.6 (1.1-.2) 

0 (0.6-1.0) 


Wang et al 50 

>6.6 cm for men 

62 

37 

1.0 (0.7-1.4) 

1.0 (0.6-1.7) 



>2.5 cm for women 

48 

48 

0.9 (0.7-1.2) 

1.1 (0.8-1.4) 


Wall-Occiput Distance 

Siminoski et al 32 

>0 cm 

88 

46 

3.8 (2.9-5.1) 

0.6 (0.5-0.7) 


Rib-Pelvis Distance 

Siminoski et al 32 

<2 Finger breadths 

88 

46 

3.8 (2.9-5.1) 

0.6 (0.5-0.7) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
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gle best predictor of low BMD 11 - 45 - 46 and is an important vari¬ 
able in 4 of the 5 prediction rules reviewed here. Bedogni et 
al 51 reported that body weight allowed a better classification 
of BMD than did BMI, with women weighing fewer than 51 
kg having a much greater risk for osteoporosis than do 
women weighing more (LR+, 7.3; Table 36-5). 

The cross-sectional survey by Michaelsson et al 44 demon¬ 
strated that body weight was the best predictor of BMD 
among measures of body size in women. In this study, 


women weighing fewer than 60 kg had a greater risk for 
osteoporosis than women who weighed more (LR+, 3.6). 
Women weighing 60 to 70 kg or more than 70 kg had a lower 
risk for osteoporosis (LR+, 0.3, and LR+, 0.2, respectively). 
Study limitations included a 20% participation rate and a 
low prevalence of osteoporosis. 

Dargent-Molina et al 47 found current body weight to be the 
strongest predictor of very low bone mass (defined as a T 
score < -3.5 SD). When BMD was measured in the 50% of 


Table 36-7 Selection Criteria and Decision Rules Reported for Bone Mineral Density Testing Among Postmenopausal Women Considering Treatment" 4145a 


Guideline/Rule 

Selection Cut Point 

Scoring System 6 

Simple Calculated Osteoporosis Risk Estimation 41 

Score > 6 

LR+, 1.2 

LR-, 0.02 

Not black = 5 points 

Rheumatoid arthritis = 4 points 

History of minimal trauma fracture after age 45 y = 4 points for each fracture 
of the wrist, hip, or rib (maximum, 12 points) 

Never used estrogen therapy = 1 point 

3 x first digit of age in years =_points 

-1 x weight in pounds divided by 10 (truncated to integer) =points 


Osteoporosis Risk Assessment Instrument 42 

Score > 9 

Age: 


LR+, 1.4 

>75 y = 15 points 


LR-, 0.1 

65-74 y = 9 points 

55-64 y = 5 points 

Weight: 

<60 kg = 9 points 

60-69.9 kg = 3 points 

No current estrogen use = 2 points 


National Osteoporosis Foundation 11 

Score > 1 

LR+, 1.2 

LR-, 0.2 

Age > 65 y = 1 point 

Weight < 57.6 kg = 1 point 

History of minimal trauma fracture after age 40 y = 1 point 

Family history of fracture = 1 point 

Current cigarette smoking = 1 point 

Age, body size, no estrogen 43 

Score > 2 

Age > 65 y = 1 point 


LR+, 1.6 

Weight < 63.5 kg = 1 point 


LR-, 0.3 

Never used oral contraceptives or estrogen therapy for > 6 mo = 1 point 

Dubbo Osteoporosis Epidemiology Study 44 

Score > 10 

Age: 


LR+, 1.7 

<70 = 1 point 


LR-, 0.3 

70-79 = 2 points 


80-84 = 3 points 


>85 = 4 points 
>90 = 16 points 
Weight: 

<55 kg = 1 point 
55-64 = 2 points 
65-69 = 3 points 
70-74 = 4 points 
75-79 = 6 points 
Previous fracture: 

Yes = 2 points 
No = 1 point 

Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“The LR+ and LR- are for patients with findings at or above the threshold score (LR+) or below the threshold score (LR-). Diagnosis of osteoporosis was based on T scores less 
than -2.5 for all rules. 

Tor each new guideline/rule, sum up the total points to get the score. 
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women who weighed the least (< 59 kg), the LR+ was 1.9 and 
the LR- was 0.3. 

Although all of the studies that met our inclusion criteria 
were samples of women, other studies that were excluded 
because they reported only regression analysis data found 
similar associations between BMD at all sites and weight in 
both men and women, with weight having a similar influence 
in each sex. 11 - 51 ’ 52 

Thus, body weight lower than 59 kg appears to be a simple 
and reasonably sensitive but nonspecific measure for select¬ 
ing women for further diagnostic testing. Heavier patients 
have a lower likelihood of osteoporosis. However, osteoporo¬ 
sis cannot be ruled out according to weight greater than 59 
kg alone because of the broad range of LRs across the 3 stud¬ 
ies (Table 36-5). 

Kyphosis 

Flexicurve measurements in women were converted into 
kyphosis index values by Ettinger et al, 28 with the highest 
decile of kyphosis index used to classify patients as kyphotic. 
Ettinger et al 28 reported that kyphosis was associated with 
reduced BMD and significant height loss. The presence of 
kyphosis was specific though not sensitive for osteoporosis 
(LR+, 3.1; LR-, 0.8). It is not clear whether the clinician’s 
simple observation of kyphosis without sophisticated mea¬ 
surements would yield the same result (Table 36-5). 

Self-reported humped back was reported by Kantor et al 53 
to be highly specific for hip osteoporosis in more than 2000 
women referred for densitometry, with an LR+ of 3.0. The 
absence of self-reported humped back is not useful (LR-, 
0.85; Table 36-5). 

Wall-Occiput Distance 

Siminoski et al 31 reported in abstract form that a kyphosis 
angle greater than 43 degrees or wall-occiput distance greater 
than 7 cm in women rules in a thoracic fracture with a high 
degree of accuracy, and a kyphosis angle less than 20 degrees 
or wall-occiput distance of 0 cm reduces the chance of tho¬ 
racic fracture but does not reliably rule it out. The 0-cm cut¬ 
off seems most pragmatic, with an LR+ of 4.6 for thoracic 
fracture when a patient cannot place the back of her head to 
the wall (Figure 36-1 and Table 36-6). In a sample size of 60 
elderly women, however, Balzini et al 54 did not find a rela¬ 
tionship between wall-occiput distance and vertebral frac¬ 
tures (data were not presented for calculating LRs). 

Rib-Pelvis Distance 

Rib-pelvis distance of less than or equal to 2 fmgerbreadths 
was calculated to have an LR+ of 3.8 and an LR- of 0.6 for 
detecting occult lumbar fractures (Table 36-6). 32 Adjusting 
for patient height does not affect the operating characteris¬ 
tics of this test and is unnecessary. The LRs for vertebral frac¬ 
ture in a woman with 0 and 4 fingerbreadths of rib-pelvis 
distance are 12 and 0.1, respectively. Thus, a low rib-pelvis 
distance may increase the posttest probability of lumbar frac¬ 
ture to a level at which further testing is warranted. 


Grip Strength 

Of the common measures of muscle strength, grip strength is 
most feasible to evaluate in the typical primary care clinic. Di 
Monaco et al 33 reported a positive association between grip 
strength and distal radius BMD in postmenopausal women 
in multiple regression analysis adjusted for age, years since 
menopause, years of ovarian activity, body height, body 
weight, BMI, and calcium and alcohol dietary intake, with an 
LR+ of 1.5 (Table 36-5). 

Foley et al 34 examined the relationship between hand grip 
strength and femur BMD, with the goal of canceling out the 
effects of other anthropometric data, and did not find a rela¬ 
tionship between grip strength and proximal femur BMD for 
men. In women, it was thought that weight was related both 
to grip strength and femur BMD, with an LR+ of 1.3 for 
osteoporosis when a cutoff of less than 27.2 kg (60 lb) on the 
dynamometer was used. 

Several other studies reported a positive association between 
grip strength and BMD, although reported data were not suffi¬ 
cient to calculate LRs. 55 ' 59 Overall, grip strength has insufficient 
sensitivity and inconsistent results for specificity. 

Hand Skinfold 

Orme and Belchetz 20 studied the skinfold thickness in con¬ 
secutive women in an osteoporosis clinic compared with 
normal, younger control women and reported ORs for a 
range of skinfold thickness of 1.5 to 2.1. These ORs corre¬ 
sponded to an LR+ of 1.2 and an LR- of 0.4 (Table 36-5). 
Although simple to perform, skinfold thickness does not 
appear to be useful in the diagnosis of osteoporosis. 

Tooth Count 

Several studies have not shown a relationship between 
tooth loss and osteoporosis, 60-63 but inclusion of younger 
patients may have limited their ability to detect an associa¬ 
tion. 64 It is not clear whether population studies reveal 
women with poor dental hygiene and tooth loss or tooth 
loss from osteoporosis. 

Inagaki et al 64 reported that among postmenopausal 
women, the proportion of women with fewer than 20 teeth 
increased from 7% in the normal BMD group to 32% in the 
very low BMD group. The age-adjusted odds of having fewer 
than 20 teeth were significantly greater among women in the 
very low BMD group compared with the normal BMD 
group. The LR+ for having very low BMD if fewer than 20 
teeth are counted is 3.4, but choosing a threshold of fewer 
than 22 teeth provides no additional clinical information 
(Table 36-5). 60 

In a retrospective study, Astrom et al 65 found that elderly 
women with the least remaining teeth had twice the risk of 
hip fracture compared with women with the most teeth. For 
men, the risk was more than 3-fold. Unfortunately, the cut 
point number of teeth dividing the patients was not pro¬ 
vided. May et al 66 found an association between self-reported 
tooth loss and BMD of the hip and spine using bone densito¬ 
metry in older men that was independent of age, BMI, and 
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Box 36-1 Physical Examination Maneuvers Suggesting Presence 
of Osteoporosis or Spinal Fracture 

WALL-OCCIPUT DISTANCE 

Inability to touch occiput to the wall when standing with 
back and heels to the wall 

WEIGHT 

Fewer than 51 kg 

RIB-PELVIS DISTANCE 

Fewer than 2 fingerbreadths between the inferior margin 
of the ribs and the superior surface of the pelvis in the 
midaxillary line 

TOOTH COUNT 

Fewer than 20 teeth 

SELF-REPORTED HUMPED BACK 

Patient report that back has become humped 


cigarette use. Other population-based studies reviewed dem¬ 
onstrated variable positive correlations between tooth counts 
and BMD. 67 ' 72 Overall, tooth counts are easy to do, and fewer 
than 20 teeth can reasonably lead the clinician to screen fur¬ 
ther for osteoporosis. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 The reluctant 64-year-old woman has a pretest 
probability of approximately 22% for osteoporosis at any 
site (Table 36-1). Her low weight (< 51 kg) has an LR of 
7.3, thus increasing her posttest probability of osteoporo¬ 
sis to 67%. She decides that this level of risk makes the 
drive to the testing center worthwhile. 

CASE 2 The prevalence of grade 2 or grade 3 vertebral 
deformities in women aged 75 to 79 years is approxi¬ 
mately 29%. 75 The LR+ of a positive wall-occiput maneu¬ 
ver is 4.6, resulting in a posttest probability of 65%. This 
78-year-old patient is likely to have vertebral fractures. If 
spine radiographs confirm the presence of vertebral frac¬ 
tures, then she should be considered for osteoporosis 
therapy. BMD testing to confirm osteoporosis is not 
required but may help guide therapeutic decisions. 

CASE 3 Although the 58-year-old woman does not meet 
current screening guidelines for dual-energy x-ray 
absorptiometry, the low rib-pelvis distance detected on 
her physical examination increases the probability that 
she already has occult vertebral fracture from 3.4% to 
12%. 75 Her self-reported humped back increases the prob¬ 
ability that she has osteoporosis from 15% to 37%, 
prompting early assessment of her bone density. 

THE BOTTOM LINE 

No single physical examination finding or combination of 

findings is sufficient to rule in osteoporosis or spinal fracture 


without further testing. The risk factor prediction rules for 
osteoporosis quoted in this article have more informative 
negative LRs than any of the physical findings and may 
reduce the need for testing in low-risk women. Several con¬ 
venient examination maneuvers, including low body weight 
(< 51 kg), inability to place the back of the head against a 
wall when standing upright, low tooth count, self-reported 
humped back, and rib-pelvis distance, can significantly 
increase the likelihood of osteoporosis or spinal fracture and 
identify additional women who would benefit from earlier 
screening. Box 36-1). 

Although the major osteoporosis clinical focus has been on 
women, the hip fracture incidence in 80-year-old men is similar 
to that in 75-year-old women. 73 A review of male osteoporosis 
suggests that the risk factors for men are the same (eg, BMD and 
body weight), although the level of risk is different from that for 
women. 74 Because osteoporosis develops at a later age in men, 
meaningful research is needed to determine whether the exami¬ 
nation findings have similar properties in men or whether there 
is an age at which men should be screened for BMD similar to 
the recommendations for women. 


Author Affiliations at the Time of the Original Publication 

Ambulatory Care Service (Drs Green and Bastian) and Geri¬ 
atrics Research Education and Clinical Center (Drs Colon- 
Emeric and Lyles), Durham Veterans Affairs Medical Center, 
and Center for the Study of Aging and Human Develop¬ 
ment (Drs Colon-Emeric, Bastian, and Lyles) and Depart¬ 
ment of Internal Medicine (Drs Green, Colon-Emeric, Bastian, 
Drake, and Lyles), Duke University Medical Center, Durham, 
North Carolina. 

Acknowledgments 

Dr Lyles was supported by grants AG11268 from the National 
Institute on Aging, 2031AH94004 from the Bureau of Health 
Professions, RR-30 from the Division of Research Resources, 
General Clinical Research Centers Program, National Insti¬ 
tutes of Health, and by the Veterans Affairs Medical Research 
Service. Dr Colon-Emeric is supported by the Paul A. Beeson 
Award, National Institute on Aging grant AG024787, and the 
Claude D. Pepper Older Americans Independence Center. Dr 
Bastian is supported by the Veterans Affairs Career Develop¬ 
ment Award from Health Services Research and Development. 

We acknowledge the following internal reviewers: Adi 
Cohen, MD, Darnel DeWalt, MD, and Margaret L. Gourlay, 
MD. We thank Lesa Hall-Young (Medical Media, Durham 
Veterans Affairs Medical Center, Durham, North Carolina) 
for technical assistance with the figures. 


REFERENCES 

1. Riggs BL, Melton LJ. Osteoporosis: Etiology, Diagnosis, and Management. 
Philadelphia, PA: Lippincott-Raven; 1995. 

2. Pachucki-Hyde L. Assessment of risk factors for osteoporosis and frac¬ 
ture. Nurs Clin North Am. 2001;36(3):401-408. 

3. Kramer AM, Steiner JF, Schlenker RE, et al. Outcome and costs after hip 
fracture and stroke. JAMA. 1997;277(5):396-404. 

4. Lyles K, Gold D, Shipp K, Pieper C, Martinez S, Mulhausen PL. Osteo¬ 
porotic vertebral compression fractures: their association with impaired 
functional status. Am JMed. 1993;94(6):595-601. 






CHAPTER 36 Osteoporosis 


5. Magaziner J, Simonsick EM, Kashner TM, Hebei JR, Kenzora JE. Predic¬ 
tors of functional recovery one year following hospital discharge for hip 
fracture: a prospective study. / Gerontol. 1990;45(3):M101-M107. 

6. Randell A, Nguyen TV, Bhalerao N, Silverman SL. Deterioration in qual¬ 
ity of life following hip fracture: a prospective study. Osteoporos Int. 
2000;ll(5):460-466. 

7. Ray N, Chan J, Thalmer M, Melton L. Medical expenditures for the 
treatment of osteoporosis in the United States in 1995: report from the 
National Osteoporosis Foundation. J Bone Miner Res. 1997;12(l):24-35. 

8. Jette A, Harris B, Cleary P, Campion E. Functional recovery after hip 
fracture. Arch Phys Med Rehabil. 1987;68(10):735-740. 

9. Melton LJ III, Eddy DM, Johnston CC Jr. Screening for osteoporosis. 
Ann Intern Med. 1990;112(7):516-528. 

10. Nelson HD, Hefland M, Woolf SH, Allan JD. Screening for postmeno¬ 
pausal osteoporosis: a review of the evidence for the US Preventative Ser¬ 
vices Task Force. Ann Intern Med. 2002; 137(6):529-541. 

11. National Osteoporosis Foundation. Physicians Guide to Prevention and 
Treatment of Osteoporosis. Washington, DC: National Osteoporosis 
Foundation; 2000. 

12. Goddard D, Kleerekoper M. The epidemiology of osteoporosis: practical 
implications for patient care. Postgrad Med. 1998; 104(4):54-72. 

13. Eastell R, Cedel SL, Wahner HW, Riggs BL, Melton LJ III. Classification 
of vertebral fractures. / Bone Miner Res. 1991;6(3):207-215. 

14. Nelson HD, Morris CD, Kraemer DF, et al. Osteoporosis in Postmenopausal 
Women: Diagnosis and Monitoring. Rockville, MD: Agency for Healthcare 
Research and Quality; 2002. AHRQ Publication No. 01-E032. 

15. Snelling AM, Crespo CF, Schaeffer M, Smith S, Walbourn L. Modifiable and 
nonmodifiable factors associated with osteoporosis in postmenopausal 
women: results from the Third National Health and Nutrition Examination 
Survey, 1988-1994./ Womens Health Gend Based Med. 2001;10(l):57-65. 

16. Zimmerman SI, Girman CJ, Buie VC, et al. The prevalence of osteoporo¬ 
sis in nursing home residents. Osteoporos Int. 1999;9(2): 151-157. 

17. Melton LJ III, Lane AW, Cooper C, Eastell R, O’Fallon WM, Riggs BL. 
Prevalence and incidence of vertebral deformities. Osteoporos Int. 
1993;3(3): 113-119. 

18. Chappard D, Alexandre C, Robert JM, Riffat G. Relationships between 
bone and skin atrophies during aging. Acta Anat (Basel). 1991; 141(3): 
239-244. 

19. Robinson RJ, al-Azzawi F, Iqbal SJ, Abrams K, Mayberry JF. The relation 
of hand skin-fold thickness to bone mineral density in patients with 
Crohn’s disease. Eur / Gastroenterol Hepatol. 1997;9(10):945-949. 

20. Orme SM, Belchetz PE. Is a low skinfold thickness an indicator of osteo¬ 
porosis? Clin Endocrinol (Oxf). 1994;41(3):283-287. 

21. Versluis RG, Petri H, van de Ven CM, et al. Usefulness of arm span and 
height comparison in detecting vertebral deformities in women. 
Osteoporos Int. 1999;9(2): 129-133. 

22. Payette H, Kertgoat MJ, Shatenstein B, Boutier V, Nadon S. Validity of 
self-reported height and weight estimates in cognitively intact and 
impaired elderly individuals. / Nutr Health Aging. 2000;4(4):223-228. 

23. Sanila M, Kotaniemi A, Viikari J, Isomaki H. Height loss rate as a marker 
of osteoporosis in postmenopausal women with rheumatoid arthritis. 
Clin Rheumatol. 1994;13(2):256-260. 

24. Cummings SR, Nevitt MC, Browner WS, et al. Risk factors for hip frac¬ 
ture in white women: Study of Osteoporotic Fractures Research Group. 
N Engl J Med. 1995;332(12):767-773. 

25. Must A, Phillips SM, Naumova EN, et al. Recall of early menstrual his¬ 
tory and menarcheal body size: after 30 years, how well do women 
remember? Am I Epidemiol. 2002;155(7):672-679. 

26. Must A, Willett WC, Dietz WH. Remote recall of childhood height, weight 
and body build by elderly subjects. Am }Epidemiol. 1993;138( 1 ):56-64. 

27. Norgan NG, Cameron N. The accuracy of body weight and height recall in 
middle-aged men. Int J Obes RelatMetab Disord. 2000;24(12):1695-1698. 

28. Ettinger B, Black DM, Palermo L, Nevitt MC, Melnikoff S, Cummings SR. 
Kyphosis in older women and its relation to back pain, disability and osteope¬ 
nia: the study of osteoporotic fractures. Osteoporos Int. 1994;4( 1 ):55-60. 

29. Chow RK, Harrison JE. Relationship of kyphosis to physical fitness and bone 
mass on post-menopausal women. Am / Phys Med. 1987;66(5):219-227. 

30. Cortet B, Houvenagel E, Puisieux F, Roches E, Garneir P, Delcambre B. 
Spinal curvatures and quality of life in women with vertebral fractures 
secondary to osteoporosis. Spine. 1999;24(18):1921-1925. 

31. Siminoski K, Lee K, Warshawski R. Accuracy of physical examination for 
detection of thoracic vertebral fractures. / Bone Miner Res. 2003;18(suppl 
2):S82. 


32. Siminoski K, Warshawski RS, Jen H, Lee K. Accuracy of physical exami¬ 
nation using the rib-pelvis distance for detection of lumbar vertebral 
fractures. Am I Med. 2003;115(3):233-236. 

33. Di Monaco M, Di Monaco R, Manca M, Cavanna A. Handgrip strength 
is an independent predictor of distal radius bone mineral density in 
postmenopausal women. Clin Rheumatol. 2000;19(6):473-476. 

34. Foley KT, Owings TM, Pavol MJ, Grabiner MD. Maximum grip strength 
is not related to bone mineral density of the proximal femur in older 
adults. Calcif Tissue Int. 1999;64(4):291-294. 

35. Metlay JP, Kappor WN, Fine MJ. Does this patient have community- 
acquired pneumonia? diagnosing pneumonia by history and physical 
examination. JAMA. 1997;278(17):1440-1445. 

36. Bickley L. Bate's Guide to Physical Examination and History Taking. Phil¬ 
adelphia, PA: Lippincott Williams & Wilkins; 1999. 

37. DeGowin R. DeGowin and DeGowins Diagnostic Examination. 6th ed. 
New York, NY: McGraw-Hill; 1994. 

38. Cabot RC. Cabot and Adams Physical Diagnosis. 13th ed. Baltimore, MD: 
Williams & Wilkins; 1942. 

39. Delp MH, Manning RT. Majors Physical Diagnosis: An Introduction to 
the Clinical Process. 9th ed. Philadelphia, PA: WB Saunders; 1981. 

40. Holleman DR Jr, Simel DL. Does the clinical examination predict airflow 
limitation? JAMA. 1995;273(4):313-319. 

41. Lydick E, Cook K, Turpin J, Melton M, Stine R, Byrnes C. Development and 
validation of a simple questionnaire to facilitate identification of women 
likely to have low bone density. Am J Manag Care. 1998;4( 1 ):37-48. 

42. Cadarette SM, Jaglal SB, Kreiger N, Mclsaac WJ, Darlington GA, Tu JV. 
Development and validation of the Osteoporosis Risk Assessment 
Instrument to facilitate selection of women for bone densitometry. 
CMAJ. 2000;162(9): 1289-1294. 

43. Weinstein L, Ullery B. Identification of at-risk women for osteoporosis 
screening. Am J Obstet Gynecol. 2000;183(3):547-549. 

44. Michaelsson K, Bergstrom R, Mallmin H, Holmberg L, Wolk A, Ljun- 
ghall S. Screening for osteopenia and osteoporosis: selection by body 
composition. Osteoporos Int. 1996;6(2):120-126. 

45. Cadarette SM, Jaglal SB, Murray TM, Mclsaac WJ, Joseph L, Brown JP. 
Evaluation of decision rules for referring women for bone densitometry 
by dual-energy x-ray absorptiometry. JAMA. 2001;286(l):57-63. 

46. Nguyen TV, Center JR, Pocock NA, Eisman JA. Limited utility of clinical 
indices for the prediction of symptomatic fracture risk in postmeno¬ 
pausal women. Osteoporos Int. 2004;15(l):49-55. 

47. Dargent-Molina P, Poitiers F, Breart G; EPIDOS Group. In elderly 
women weight is the best predictor of a very low bone mineral density: 
evidence from the EPIDOS Study. Osteoporos Int. 2000;ll(10):881-888. 

48. Vogt TM, Ross PD, Palermo L, et al; Fracture Intervention Trial Research 
Group. Vertebral fracture prevalence among women screening for the 
Fracture Intervention Trial and a simple clinical tool to screen for undiag¬ 
nosed vertebral fractures. Mayo Clin Proc. 2000;75(9):888-896. 

49. Verhaar HJ, Koele JJ, Neijzen T, Dessens JA, Duursma SA. Are arm span 
measurements useful in the prediction of osteoporosis in postmeno¬ 
pausal women? Osteoporos Int. 1998;8(2):174-176. 

50. Wang XF, Duan Y, Henry M, Kin BT, Seeman M. Body segment lengths 
and arm span in healthy men and women and patients with vertebral 
fractures. Osteoporos Int. 2004;15(l):43-48. 

51. Bedogni G, Simonini G, Viaggi S, et al. Anthropometry fails in classify¬ 
ing bone mineral status in postmenopausal women. Ann Hum Biol. 
1999;26(6):561-568. 

52. May H, Murphy S, Khaw KT. Age-associated bone loss in men and 
women and its relationship to weight. Age Ageing. 1994;23(3):235-240. 

53. Kantor S, Ossa KS, Hoshaw-Woodard SL, Lemeshow S. Height loss and 
osteoporosis of the hip. / Clin Densitom. 2004;7(l):65-70. 

54. Balzini L, Vannucchi L, Benvenuti F, et al. Clinical characteristics of flexed 
posture in elderly women. / Am GeriatrSoc. 2003;51(10):1419-1426. 

55. Kritz-Silverstein D, Barrett-Connor E. Grip strength and bone mineral 
density in older women. / Bone Miner Res. 1994;9(1):45-51. 

56. Taaffe DR, Pruitt L, Lewis B, Marcus R. Dynamic muscle strength as a 
predictor of bone mineral density in elderly women. / Sports Med Phys 
Fitness. 1995;35(2):136-142. 

57. Bauer DC, Browner WS, Cauley JA, et al. Factors associated with appen¬ 
dicular bone mass in older women. Ann Intern Med. 1993;118(9):657- 
665. 

58. Sinaki M, Fitzpatrick LA, Ritchie CK, Montesano A, Wahner HW. Site- 
specificity of bone mineral density and muscle strength in women: job 
related physical activity. Am J Phys Med Rehabil. 1998;77(6):470-476. 


CHAPTER 36 The Rational Clinical Examination 


59. Sinaki M, Wahner HW, Offord KP. Relationship between grip strength 
and related regional bone mineral content. Arch Phys Med Rehabil. 
1989;70(12):823-826. 

60. Earnshaw SA, Keating N, Hosking DJ, et al. Tooth counts do not predict 
bone mineral density in early postmenopausal Caucasian women. Int J 
Epidemiol 1998;27(3):479-483. 

61. Klemetti E, Vainio P. Effect of bone mineral density in skeleton and man¬ 
dible on extraction of teeth and clinical alveolar height. / Prosthet Dent. 
1993;70(l):21-25. 

62. Mercier P, Inoue S. Bone density and serum minerals in cases of residual 
alveolar ridge atrophy. / Prosthet Dent. 1981;46(3):250-255. 

63. Elders PJ, Habets LL, Netelenbos JC, Van der Linden LW, Van der Stelt 
PF. The relation between periodontitis and systemic bone mass in 
women between 46 and 55 years of age. / Clin Periodontol. 1992;19(7): 
492-496. 

64. Inagaki K, Kurosu Y, Kamiya T, et al. Low metacarpal bone density, tooth 
loss, and periodontal disease in Japanese women. / Dent Res. 2001;80(9): 
1818-1822. 

65. Astrom J, Backstrom C, Thidevall G. Tooth loss and hip fractures in the 
elderly. J Bone Joint Surg Br. 1990;72(2):324-325. 

66. May H, Reader R, Murphy S, Khaw KT. Self reported tooth loss and bone 
mineral density in older men and women. Age Ageing. 1995;24(3):217- 
221 . 


67. Kribbs PJ, Chesnut CH, Ott SM, Kilcoyne RF. Relationships between 
mandibular and skeletal bone in an osteoporotic population. / Prosthet 
Dent. 1989;62(6):703-707. 

68. Krall EA, Dawson-Hughes B, Papas A, Garcia RI. Tooth loss and skeletal bone 
density in healthy postmenopausal women. Osteoporos Int. 1994;4(2):104- 
109. 

69. Krall EA, Garcia RI, Dawson-Hughes B. Increased risk of tooth loss is 
related to bone loss at the whole body, hip, and spine. Calcif Tissue Int. 
1996;59(6):433-437. 

70. Phillips HB, Ashley FP. The relationship between periodontal disease 
and a metacarpal bone index. Br Dent J. 1973;134(6):237-239. 

71. Habets LL, Bras J, Borgmeyer-Hoelen AM. Mandibular atrophy and 
metabolic bone loss. Int J Oral Maxillofac Surg. 1988;17(3):208-211. 

72. Daniell HW. Postmenopausal tooth loss. Arch Intern Med. 1983; 
143(9): 1678-1682. 

73. De Laet C, van Hout BA, Burger H, Hofman A, Pols HA. Bone density 
and risk of hip fracture in men and women: cross sectional analysis. 
BMJ. 1997;315(7102):221-225. 

74. Kaufman JM, Johnell O, Abadie E, et al. Background for studies on the 
treatment of male osteoporosis: state of the art. Ann Rheum Dis. 
2000;59(10):765-772. 

75. Melton LJ 3rd. Epidemiology of spinal fractures. Spine. 1997;22(24 suppl): 
2S-11S. 


Osteoporosis 36 

Prepared by Cathleen S. Colon-Emeric, MD, 
and David L. Simel, MD, MHS 

Reviewed by Kenneth W. Lyles, MD 


CLINICAL SCENARIO 


You are treating a 68-year-old man with a history of 
chronic obstructive pulmonary disease (COPD). He has 
not used long-term oral steroids and has not had a previ¬ 
ous fracture. He weighs 72 kg. Should he be referred for 
bone densitometry? 

UPDATED SUMMARY ON OSTEOPOROSIS 

Original Review 

Green AD, Colon-Emeric CS, Bastian L, Drake MT, Lyles 
KW. Does this woman have osteoporosis? JAMA. 2004; 
292(23):2890-2900. 

UPDATED LITERATURE SEARCH 

We repeated the literature search used in the original article, 
confined to 2004 to April 2006 and restricted to adult 
patients. We limited this by cross-linking to “exp physical 
examination/ or physical exam” and with the text words 
“sensitivity” or “specificity.” The strategy yielded 9 abstracts 
that we reviewed, of which 4 met our inclusion criteria. One 
of the included articles evaluated a prediction model for 
osteoporosis in Asian men. 1 From a review of the references, 
we found a second article that evaluated a similar prediction 
model in US male veterans. 2 We were unable to obtain a 
copy of a third article, although it evaluated a relatively 
small number of patients and would have therefore been of 
lower quality. 

NEW FINDINGS 

• Useful data are now available for osteoporosis in men. The 
variables in a simple prediction model show that the risk 
factors (body size and age) for men are similar to those for 
women. 

• The clinical diagnosis of kyphosis from gross observations 
is about 85% accurate, but it does not diagnose osteoporo¬ 
sis, as well as more formal measures of kyphosis. 


• A body mass index (BMI) less than 25 in older women is the 
single best finding for detecting women with osteoporosis, 
performing better than decision rules. However, a BMI 
greater than 25 is not as informative as the decision rules for 
identifying women at the lowest risk of osteoporosis. 

• In women at higher risk for osteoporosis, historical height 
loss identifies those most likely to have vertebral fractures. 
However, historical height loss should not be used as a 
screening test for osteoporosis for most postmenopausal 
women. 

Details of the Update 

A new clinical model was developed prospectively in healthy 
Argentinean postmenopausal women attending a menopause 
clinic. 3 These women were attending the clinic for a variety of 
reasons; therefore, the study sample was without referral bias. 
The only clinical sign evaluated in the study was kyphosis, 
measured with 85% accuracy from simple clinical observa¬ 
tions. The study systematically collected multiple risk factors 
to develop a 5-variable model for predicting osteoporosis 
measured on hip bone mineral density. The independent pre¬ 
dictors (> 10 years of menopause, calcium intake < 1200 mg/d, 
kyphosis, BMI < 25, and history of fractures) were validated 
prospectively in a new set of patients. 

Few data exist for osteoporosis in men. A large, prospective 
study of Asian men yielded a simple prediction model with vari¬ 
ables similar to the variables predicting osteoporosis in women, 
with weight and age as the variables in the final model (the 
Osteoporosis Self-assessment Test [OST]). 4 A study in US male 
veterans, using the same scoring system, yielded similar results. 2 
Cut points vary substantially within the population, in part 
because of differences in the average weight of the population. 

Women may report height loss, but does it predict vertebral 
fractures? A study conducted in women who were referred to 
an endocrinologist for an osteoporosis evaluation showed that 
a difference of more than 6 cm between the measured height 
and tallest recalled height makes a vertebral fracture highly 
likely. 5 However, the study population had a 57% prevalence of 
vertebral fractures. In population studies in which the preva¬ 
lence will be lower (10%-25%) among women older than 50 
years, the absence of such a large degree of height loss decreases 
the likelihood ratio (LR) to 0.76 but will not definitively rule 
out a fracture. Using a threshold of “any” height loss resulted in 
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a positive likelihood ratio (LR+) of 1.0 and a negative LR of 
0.75, results that are not informative. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

We qualitatively compared the Simple Calculated Osteopo¬ 
rosis Risk Estimate (SCORE) 6 and the Osteoporosis Risk 
Assessment Instrument (ORAI) 7 because both were recom¬ 
mended by the Canadian Preventive Health Services Task 
Force. The SCORE questionnaire is more efficient at detect¬ 
ing women unlikely to have osteoporosis (LR, 0.02 for a 


Table 36-8 Univariate Findings for Osteoporosis in Women 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Kyphosis detected with clinical 
observation 

1.5(1.0-2.2) 

0.73(0.51-1.0) 

BMI < 25 

4.5 (2.5-8.3) 

0.48 (0.34-0.69) 

Abbreviations: BMI, body mass index; Cl, confidence interval; LR+, positive likelihood 
ratio; LR- negative likelihood ratio. 


Table 36-9 Univariate Findings for Vertebral Fracture in Women 


Finding 

LR+ (95% Cl) 

LR- (95% Cl) 

Historical height loss 

4.6 (2.5-8.4) 

0.76 (0.67-8.4) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 36-10 Multivariate Findings for Osteoporosis in Women 


Risk Factors in the Model 

No. of Risk 
Factors Present 

Probability, % 

> 10 y Menopause 

5 

98 

Calcium intake < 1200 mg/d 

4 

73-95 

Kyphosis 

3 

21-85 

BMI < 25 kg 

2 

7-42 

Personal fractures 

1 

3-33 


0 

4 

Abbreviation: BMI, body mass index. 


Table 36-11 Osteoporosis Self-assessment Tool in Men 


Test 

OST 

Score 3 

LR+ (95% Cl) 

LR- (95% Cl) 

Asian men 

<-1 

2.2(1.8-2.8) 

0.40 (0.27-0.59) 

US veterans 

<3 

2.7 (2.1-3.5) 

0.11 (0.03-0.41) 

Summary 


2.4 (2.0-2.9) 

0.35 (0.23-0.53) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; 0ST, Osteoporosis Self-assessment Tool. 
a 0ST Score = 0.2 x (body weight [kg] - age [y]). Multiply the value in parentheses by 
the coefficient and round the score down to the nearest integer. 


score < 6) but is more complex to calculate than the ORAI. 
The ORAI also has good measurement properties and has 
only 3 variables (age, weight, and estrogen use). We recalcu¬ 
lated the confidence intervals (CIs) for the ORAI (not 
reported in the original article) and found that for an ORAI 
greater than or equal to 9, the LR is 1.6 (95% Cl, 1.4-1.8); 
for women with a more normal ORAI score of less than 9, 
the LR is 0.13 (95% Cl, 0.04-0.40). However, we observed 
that with fewer women receiving hormone replacement 
therapy, almost all postmenopausal women would score 
higher than the cut point, which limits its utility in clinical 
practice. 


CHANGES IN THE REFERENCE STANDARD 

None. 


RESULTS OF LITERATURE REVIEW 

While univariate findings can be used for identifying osteo¬ 
porosis ( able 36-8) or vertebral fracture ( ible 36-9) in 
women, a multivariate approach is preferred ( ( 10). 

A multivariate score has now been developed for men 

(fable 36-11). 


EVIDENCE FROM GUIDELINES 

Since 2004, a search that restricts “exp osteoporosis” to 
guidelines yields 26 articles that are mostly from specialty 
groups. The Canadian Task Force on Preventive Health 
Care recommends bone densitometry screening risk assess¬ 
ment every 1 to 2 years in women. 8 Those who are 65 years 
of age or older have a previous fragility fracture, weigh 
fewer than 60 kg, have a SCORE questionnaire result 
greater than or equal to 6, or have an ORAI score greater 
than or equal to 9 should be screened with bone densitome¬ 
try. 8 These latter 2 (ie, scoring questionnaire results) were 
the 2 decision rules in which normal scores make osteopo¬ 
rosis unlikely. 


CLINICAL SCENARIO—RESOLUTION 


In the absence of previous fracture or corticosteroid use, 
there is no current recommendation to guide osteoporosis 
screening for men. The prevalence of osteoporosis in men 
with COPD is approximately 10%. 

Using his age and weight, this patient’s score on the 
OST is 0 (0.2 x [72 kg - 68 y] = 0.8, rounded down to 
nearest integer = 0). The LR+ for a score less than or equal 
to 1 is 3.8. Thus, the posttest probability of osteoporosis in 
this patient is 30%, high enough to warrant further 
screening with dual-energy x-ray absorptiometry. 


490 
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OSTEOPOROSIS—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Table 36-14 Osteoporosis Risk Assessment Instrument 

The prior probability of osteoporosis in women depends on 

age and ethnicity (Tables 36-12 and 36-13). ^ em Scoring LR (95% Cl) 

. , 

Table 36-1 2 Age-Dependent Prevalence of Osteoporosis in White 65 74 g p ojnts LR = 1.6 (1.4-1.8) 

Women 

55-64 y = 5 Points 

Age, y Prevalence, % ,,, . — -—- n n . -- T , . n 

Weight < 60 kq = 9 Points Total score < 9 points. 

50-59 15 60-69.9 kg = 3 Points LR = 0.13 (0.04-0.40) 

t ’ u ' by No current 2 Points 

70-79 39 estrogen use 

-80 70 Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

Table 36-13 Age-Dependent Prevalence in Nonwhite Women Table 36-15 Osteoporosis Self-assessment Tool 

Women > 50 y Prevalence, % Test Abnormal LR+ (95% Cl) LR- (95% Cl) 

Non-Hispanic black women 12 Score = 0.2 Depends on pop- 2.4 (2.0-2. 9) 0.35(0.23-0.53) 

Mexican American women 19 (body weight [kg] ulation; < —1 in 

Other ethnicity 28 ~ a 9 e M>' Asian men, <3 

Comparable data for men have not been adequately nearest integer ans 

validated. Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 

likelihood ratio. 

POPULATION FOR WHOM OSTEOPOROSIS 

SHOULD BE CONSIDERED 

Age beyond menopause and low BMI (<25) or weight (<60 kg) REFERENCE STANDARD TESTS 

are the most important predictors of osteoporosis in Bone mineral densitometry with T score values less than or 
women. Older age and low BMI might also be the most equal to 2.5 SDs below the mean of young, healthy population, 
important factors in men. Any older patient with a mini¬ 
mal trauma fracture or kyphosis should be screened for 
osteoporosis. 

DETECTING THE LIKELIHOOD OF OSTEOPOROSIS 

The SCORE and ORAI questionnaires have the best measure¬ 
ment properties for screening (see Tables 36-14 and 36-15), 
but the ORAI is a bit easier to use. The OST has not been as 
extensively validated in women but is one of the few tests with 
evidence in men. 
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EVIDENCE TO 


SUPPORT THE UPDATE: 


Osteoporosis 



TITLE Performance of the Osteoporosis Self-assessment 
Screening Tool for Osteoporosis in Men. 

AUTHORS Adler RA, Tran MT, Petkov VI. 

CITATION Mayo Clin Proc. 2003;78(6):723-727. 

QUESTION Can the Osteoporosis Self-assessment Tool 
(OST) be used to detect older men at high risk for osteo¬ 
porosis? 

DESIGN Cross-sectional. 

SETTING Rheumatology and pulmonary clinics at a sin¬ 
gle Veterans Affairs health center. 

PATIENTS One hundred eighty-one men (69% white, 
30% black) without previous dual-energy x-ray absorpti¬ 
ometry (DXA) measurement recruited from clinic popu¬ 
lation. Mean age was 64.3 years but with a wide range (32- 
87 years). The mean weight of the participants was also 
high, at 91 kg (mean body mass index, 29). 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Participants completed questionnaires including self-reported 
age, weight, and other risk factors for osteoporosis. The OST 
was calculated with the following formula: 

OST score = 0.2 x (body weight [kg] - age [y]) 
with the result rounded down to the nearest integer 

MAIN OUTCOME MEASURES 

All participants had bone density measured by DXA as the 
diagnostic standard. A T score of less than or equal to -2.5 at 
any site was used to define osteoporosis. 

MAIN RESULTS 

Overall, 16% of participants had osteoporosis. The OST had 
good discriminative properties (area under the curve, 0.836). 
The authors reported the sensitivity and specificity at various 


Table 36-16 Likelihood Ratios for Osteoporosis Self-assessment Tool 
Depending on the Cut Point Value 


OST Score 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

<3 

95 

66 

2.7 (2.2-3.5) 

0.1 (0.03-0.40) 

<2 

82 

74 

3.2 (2.3-4.3) 

0.2 (0.1-0.5) 

<1 

75 

80 

3.8 (2.6-5.5) 

0.3 (0.2-0.6) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; OST, Osteoporosis Self-assessment Tool. 


cut points ( "able 36-16). Adding other clinical risk factors 
from the questionnaire to the OST score did not improve its 
discriminative ability. 

The authors recommend a cut point of 3 as most appropri¬ 
ate for their population. This differs from the recommended 
cut points from other studies of the OST in Asian men (cut 
point < -1), community-dwelling elderly men in Rotterdam 
and Baltimore (< 2), and Asian and white women (< 1). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS The OST is simple to use and can be calculated 
with either self-reported or routinely collected information. It 
appears to be useful across a wide range of populations of both 
men and women. 

LIMITATIONS The sample was a subset of male veterans likely 
at higher risk for osteoporosis because of rheumatologic and 
pulmonary diseases and therapies. There may be an additional 
selection bias because it is unclear how men were recruited 
into the study. 

The cut point used for the OST seems to depend substantially 
on the population, in part because of large variations in average 
weight. Different cut points may be indicated in different clinical 
situations. If the goal is to exclude men who would otherwise 
have screening ordered, the higher cut point should be used. 
Conversely, if the goal is to identify men for screening who would 
otherwise not be tested, the lower cut point is more appropriate. 

Reviewed by Cathleen S. Colon-Emeric, MD, MHSc, 
and David L. Simel, MD, MHS 
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TITLE Development of a Clinical Assessment Tool in 
Identifying Asian Men With Low Bone Mineral Density 
and Comparison of Its Usefulness to Quantitative Bone 
Ultrasonography. 

AUTHORS Kung AWS, Ho AYY, Ross PD, Reginster JY. 

CITATION Osteoporosis Int. 2005;16(7):849-855. 

QUESTION Can a simple prediction rule be developed 
for detecting osteoporosis in older men? 

DESIGN Prospective volunteers. 

SETTING Patients were recruited from the community 
at “public road shows, health fairs, or health talks on 
osteoporosis” in Hong Kong. The recruitment period 
lasted more than 5 years. 

PATIENTS Southern Chinese men older than 50 years 
who volunteered. Young men (20-39 years) were also 
invited to participate as controls. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Risk factors were recorded during a structured interview. The 
only physical examination assessments were height (mea¬ 
sured with a stadiometer) and weight (measured with the 
patient shoeless, in light indoor clothing, on a balance beam 
scale). The reference standard was bone mineral density 
(BMD) of the lumbar spine and left femur. 

MAIN OUTCOME MEASURES 

The BMD T scores were obtained by comparison with the 
healthy young men recruited to the study (n = 124). Osteo¬ 
porosis was defined by a femoral neck BMD T score less than 
or equal to 2.5. 

MAIN RESULTS 

Out of 906 men who were invited to participate, 74 men 
declined and 56 were excluded for other known illnesses 


Table 36-17 Likelihood Ratios for Osteoporosis Self-assessment Tool 
OST Score Sensitivity Specificity LR+ (95% Cl) LR- (95% Cl) 

<-1 0.73 0.68 2.2(1.8-2.8) 0.40(0.27-0.59) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; OST, Osteoporosis Self-assessment Tool. 


associated with bone disease. The data for the remaining 
men (mean age, 65 years) were divided into a model develop¬ 
ment sample of 420 and a validation sample of 356. The 
prevalence of osteoporosis was 16% (n = 126). 

Osteoporosis score = 0.2 x (body weight [kg] - age [y]) 

(Multiply the value in parentheses by the coefficient and 
round the score down to the nearest integer.) 

Osteoporosis score < -1, increased risk 
Osteoporosis score > -1, low risk 

The results shown in le 36-17 were also confirmed in a 
separate validation set where the sensitivity (0.71) and speci¬ 
ficity (0.68) were essentially the same. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS This was a prospective study of community 
men who volunteered for screening. The data were studied 
in a validation set. 

LIMITATIONS The men were invited into the study through 
community health activities. Though there may have been 
some volunteer bias, the number of patients is large. 

This is one of the few studies of osteoporosis in men. 
Compared with models for women, the variables are almost 
identical, except that age replaces the measure for meno¬ 
pause. Although the variables in the prediction model were 
similar to those for women, the model for men appears to 
be slightly better at identifying cases of osteoporosis and 
slightly less efficient at confirming the absence of osteopo¬ 
rosis. The model is extremely easy to use. 

Reviewed by David L. Simei, MD, MHS 
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CHAPTER 


Does This Child Have 

Acute Otitis Media? 

Russell Rothman, MD, MPP 
Thomas Owens, MD 
David L. Simel, MD, MHS 


CLINICAL SCENARIO 


A mother notices that her 15-month-old child has a low- 
grade fever and is tugging at his ears after several days of 
cough and runny nose. The child attends day care services 
and had 1 previous episode of acute otitis media (AOM) 
about 4 months ago. In the physician’s office, he is afebrile 
but somewhat irritable and has clear rhinorrhea, mild 
posterior pharyngeal erythema, and normal chest auscul¬ 
tatory findings. Cerumen occludes the view of his right 
tympanic membrane, whereas the left tympanic mem¬ 
brane shows normal landmarks and good mobility on 
pneumatic otoscopy. After removal of the cerumen from 
his right ear, landmarks are visible on a slightly erythema¬ 
tous tympanic membrane. The tympanic membrane shows 
normal mobility on pneumatic otoscopy. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH THE CLINICAL EXAMINATION? 


Acute otitis media can be a difficult and controversial diag¬ 
nosis to make, but studies suggest that AOM is responsible 
for more than 30 million clinic visits a year in the United 
States, at a total cost exceeding $5 billion. This makes AOM 
one of the most commonly diagnosed and expensive child¬ 
hood illnesses. 1 ' 4 Studies have shown that by age 1 year, up to 
60% of all children have been diagnosed as having at least 1 
episode of AOM, and by age 3 years, more than 80% of chil¬ 
dren have had at least 1 episode. 1,5 The best estimates of the 
prevalence of AOM are based on the National Ambulatory 
Medical Care Survey. In 1990, the percentage of office visits 
with otitis media as the principal diagnosis was 17.4% for 
children aged 0 to 2 years, 18.1% for children aged 2 to 5 
years, 10.5% for children aged 6 to 10 years, and 5.2% for 
children aged 11 to 15 years. 6 The most common potential 
risk factors for diagnosis of AOM include age younger than 2 
years, male sex, day care attendance, fall or winter season, 
exposure to cigarette smoke, genetic factors, and history of 
AOM. 1,7 Breastfeeding appears to be protective. 7 

Making a correct diagnosis of AOM is often difficult, par¬ 
ticularly in young children. Distinguishing between AOM 
and otitis media with effusion (OME) can be particularly 
challenging. Several studies suggest that physicians are 
uncertain of their diagnosis of AOM as much as 40% of the 
time. 8 This uncertainty probably contributes to overdiagno¬ 
sis, as shown in a study that when physicians estimate the odds 
that a patient has AOM are 50% or less, 3 of 4 will still pre¬ 
scribe antibiotics and 1 of 4 will still prescribe antibiotics when 
the odds of AOM are < 25%. 9 Various definitions and diag¬ 
nostic criteria for AOM may also contribute to overdiagnosis. 
In a study by Hayden, 10 18 criteria sets for diagnosing AOM 
were used in 26 articles, and 165 surveyed clinicians identi¬ 
fied 147 unique criteria. Recently, an expert panel convened 
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by the Agency for Healthcare Research and Quality (AHRQ) 
released a complicated definition requiring the presence of a 
middle ear effusion and rapid onset of associated symptoms 

(Box 37- 1). 6 ' 11 

Overdiagnosis of AOM is thought to be common 7 - 12 - 13 and 
contributes to increased antibiotic use and bacterial resis¬ 
tance. Overdiagnosis may also result in unnecessary specialty 
referrals and increased use of tympanostomy tubes. In addi¬ 
tion, improper diagnosis of AOM in younger children may 
hinder the proper diagnosis of other underlying causes of 
fever or illness. 


Box 37-1 Agency for Healthcare Research and Quality Definition of 
Acute Otitis Media 6 

Presence of middle ear effusion, demonstrated by actual 
presence of fluid in the middle ear, as diagnosed by tym- 
panocentesis, or physical presence of fluid in the external 
ear canal as a result of tympanic membrane perforation or 
indicated by limited or absent mobility of the tympanic 
membrane, as diagnosed by pneumatic otoscopy, tympa- 
nogram, or acoustic reflectometry with or without the 
following: 

Opacification, not including erythema 
Full or bulging tympanic membrane 
Hearing loss 
AND 

Rapid onset (during a course of 48 hours) of 1 or more of 
the following signs or symptoms with or without anorexia, 
nausea, or vomiting: 

Otalgia (or pulling of ear in an infant) 

Otorrhea 

Irritability in infant or toddler 
Fever 


POSTERIOR ANTERIOR 

Lateral process Pars flaccida 



Figure 37-1 Tympanic Membrane Landmarks 


Anatomic/Physiologic Origins 

Genetic, infectious, immunologic, and environmental factors 
contribute to an underlying predisposition to ear infections. 2 
The eustachian tube, shorter and angled much less steeply in 
children than in adults, plays a critical role by more easily 
allowing the reflux of organisms from the nasopharynx into 
the middle ear. 2 When the tube becomes congested, as it may 
with a viral infection in the upper respiratory tract, negative 
pressure within the middle ear causes secretions to accumu¬ 
late, and this leads to the proliferation of pathogenic organ¬ 
isms. The bacterial agents most commonly identified in 
AOM include Streptococcus pneumoniae, Haemophilus influ¬ 
enzae, and Moraxella catarrhalis . 5 Coinfection with viruses is 
also observed in 30% to 40% of cases and may play a role in 
the virulence of symptoms, but less than 10% of AOM is 
caused by viruses alone. 5 - 14 - 15 Most ear infections resolve with¬ 
out any specific treatment, so the exact role of bacterial or 
viral pathogens remains unclear. 

The buildup of infectious debris behind the tympanic 
membrane, along with inflammatory mediators, produces 
the symptoms and signs of AOM. An effusion changes the 
tympanic membrane’s appearance from transparent to 
opaque and can distort or bulge the membrane, making it 
difficult to visualize normal landmarks (Figure 37-1). Ery¬ 
thema of the tympanic membrane is related to vascular con¬ 
gestion of the membrane and is thought to represent a 
nonspecific sign related to irritation of the drum or crying. 2 - 12 

How to Elicit Symptoms and Signs 

Common but usually nonspecific symptoms associated with 
the diagnosis of AOM include fever, ear pain, ear pulling, 
irritability, cough, and rhinitis. In a study of 354 children 
younger than 15 years (mean, 3.8 years) presenting for an 
acute illness, 90% of children in whom AOM was diagnosed 
had fever, ear pain, crying, and irritability alone or in combi¬ 
nation, but 72% of children without AOM also presented 
with these symptoms. 12 - 16 

To properly examine the ear for AOM, clinicians should 
use a pneumatic otoscope to visualize the landmarks and 
mobility of the tympanic membrane. After the patient is 
placed in a restrained or other safe position, the otoscope 
speculum is placed into the external auditory canal. The 
largest-sized speculum that can comfortably fit into the canal 
is recommended because a small speculum can limit the 
visual field and potentially cause pain by irritating the bony 
canal. 2,17 A study by Cavanaugh 18 suggested that children 
older than 18 months should have a soft-tipped speculum to 
provide an adequate seal and prevent air leakage when per¬ 
forming pneumatic otoscopy. It is also important that the 
otoscope have a bright light source for visualizing the tym¬ 
panic membrane. Barriga et al 19 tested otoscopes in clinics 
and emergency departments and found that 22% were inade¬ 
quate because of either a worn bulb or a weak battery source. 

To properly examine the tympanic membrane, one should 
evaluate the position, color, landmarks, degree of translu- 
cency, and mobility. The position refers to whether the drum 
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appears to be bulging toward the examiner (suggestive of 
AOM), neutral (normal), or retracted away from the exam¬ 
iner (observed in chronic OME). The tympanic membrane 
can appear red, pink, yellow (with pus behind the drum), or 
pearly gray or translucent (normal). Landmarks that should 
be visible in a normal ear include the pars flaccida, the 
malleus, and the light reflex below the umbo (Figure 37-1). 
With a translucent tympanic membrane, the outline of the 
incus can sometimes be visualized as well. An opaque drum 
may be a sign of infection or middle ear effusion and can 
result in a diminished light reflex. 

A bulb attachment can test the mobility of the drum with 
the slightest pressure or release. A study by Cavanaugh 20 sug¬ 
gests that only 10 to 15 mm H 2 0 of positive pressure is 
needed to assess drum mobility, whereas bulb attachments 
can easily create pressures of 1000 mm H 2 0 or more. Force¬ 
ful pressing of the bulb creates excessive positive pressure 
that causes pain; in this instance, pain on insufflation does 
not diagnose infection. The correctly applied positive or neg¬ 
ative pressure creates synchronous movement of the normal 
drum. An immobile drum or one with reduced mobility sug¬ 
gests the presence of a middle ear effusion. 

The tympanic membrane can sometimes be difficult to 
visualize because of patient behavior or the buildup of 
cerumen in the ear canal. Apprehensive infants and young 
children can often be sufficiently restrained by having the 
parent seat the child in his or her lap, using his or her legs 
to wrap around the child’s legs and arms to restrain the 
child’s arms and head. The examiner should hold the oto¬ 
scope with part of the hand touching the child’s head so 
that the otoscope will move with the child’s head and pre¬ 
vent injury. 

In a study of 279 children with AOM, 29% required ceru¬ 
men removal to make a proper diagnosis. 21 Studies have not 
adequately compared various modes for physically removing 
cerumen, though the most common methods cited by gener¬ 
alists are the use of a wire loop, a blunt cerumen curette, or 
gentle irrigation with room-temperature water. One small 
randomized trial compared 2 ceruminolytic agents, liquid 
docusate sodium and triethanolamine polypeptide, applied 
at an emergency department visit with or without irrigation 
15 minutes later. Liquid docusate sodium was highly effective 
compared with triethanolamine polypeptide, with successful 
cerumen removal in 82% of patients (number needed to 
treat for benefit, 3; 95% confidence interval [Cl], 2-4). 22 

Other techniques used in the diagnosis of AOM include 
tympanocentesis, tympanometry, and acoustic reflectome- 
try. Tympanocentesis is performed through an otoscope 
with a special attachment or an otomicroscope. A tuberculin 
syringe needle is placed into the inferior portion of the tym¬ 
panic membrane to aspirate fluid. 2 This technique can be 
diagnostic and is considered the criterion standard for 
detecting the presence of fluid in the diagnosis of AOM. 
However, tympanocentesis is rarely practiced in the primary 
care setting, where most AOM is managed. 12 Tympanometry 
and acoustic reflectometry both require the use of additional 
medical equipment. For tympanometry, a specialized probe 
is inserted into the canal to form a seal and measure the 


amount of reflected sound energy. The amount of reflected 
energy is used to estimate tympanic membrane motility. In 
acoustic reflectometry, tympanic membrane motility is also 
estimated according to sound reflecting from the middle ear, 
but no seal is required. Both techniques assess tympanic 
membrane motility and generally have been studied only for 
detecting an effusion in patients with OME rather than 
AOM. 1 ' 7 ' 12 ' 23 

METHODS 

Search Strategy and Quality Review 

We searched MEDLINE from January 1966 to May 2002 for 
English-language articles that examined the role of symp¬ 
toms and signs in the diagnosis of AOM. Multiple MEDLINE 
search strategies were applied by a single author (T.O.) using 
techniques that have been used by other authors in this 
series. 24,25 We also examined bibliographies of selected articles 
and used general and specialty textbooks. 1,2,7,26 ' 29 From 397 
identified references, 50 complete articles were retrieved for 
review by 2 authors (R.R. and T.O.). Among these, we found 
17 articles that specifically examined symptoms and signs 
that were directly relevant to the diagnosis of AOM. 4,10,16,23,30 ' 42 
Articles on the diagnosis of persistent OME were generally 
excluded because most of these studies were performed by 
comparing detection of an effusion by pneumatic otoscopy 
or tympanometry with the presence of an effusion at the 
time of surgery for myringotomy, rather than in ambulatory 
settings. In addition, persistent OME is a disease with differ¬ 
ent pathophysiology and, possibly, different diagnostic char¬ 
acteristics than AOM. 

The 17 identified articles underwent independent quality 
review by 2 authors (R.R. and T.O.). Quality was assessed 
with an established methodologic filter for assessing internal 
validity that has been used and explained by other authors in 
this series. 24,25 Each article was assigned a level of evidence 
(1-4) and consensus was reached by both reviewers. Tympa¬ 
nocentesis was considered the pathologic criterion standard, 
but only 1 study that assessed physical examination findings 
used this standard. 23 We therefore also included articles that 
used a standardized clinical definition of AOM when exam¬ 
ining articles that dealt with symptoms. Although using a 
clinical criterion standard was not ideal and might lead to 
accusations of circular reasoning, the quality of the literature 
for this common problem left us little choice. However, we 
believed it was justified to examine these articles because 
most physicians make a diagnosis according to clinical crite¬ 
ria, and physicians make decisions to treat according to these 
criteria. 

No article met evidence level 1 or 2, which required using 
an independent blind comparison of signs or symptoms 
against a criterion standard among consecutive patients. All 
articles reviewed were graded as evidence levels 3 to 5, but we 
retained only the level 3 and 4 articles. Level 3 studies used an 
independent, blind comparison of symptoms to the criterion 
standard and nonconsecutive patients suspected to have the 
targeted condition. Level 4 studies had a nonindependent 
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comparison of symptoms to the criterion standard and 
“grabbed” a sample of patients with the target condition and, 
perhaps, some healthy individuals. The excluded level 5 stud¬ 
ies used a nonindependent comparison of symptoms to a 
standard of uncertain validity. When possible, we used pub¬ 
lished raw data from the identified articles to calculate 
sensitivity, specificity, likelihood ratios (LRs), and 95% 
CIs, with conventional definitions. 43 

For articles in which data were presented stratified by 
multiple age groups, we present data for all age groups 
combined unless otherwise noted. Pooled analyses of mul¬ 
tiple studies were not performed because of the small num¬ 
ber and heterogeneity of studies available. In one study, 
published data were presented of the utility of physical 
examination findings compared with tympanocentesis for 2 
individual clinicians who were examining 2 separate groups 
of children. 23 In that study, 64% of children presenting with 
acute symptoms (such as ear pain, fever, respiratory symp¬ 
toms, vomiting, or diarrhea) underwent tympanocentesis, 
whereas 38% of patients without acute symptoms under¬ 
went tympanocentesis. Tympanocentesis was performed in 
any child suspected to have a middle ear effusion on pneu¬ 
matic otoscopy. In our analysis of these data, 23 we calculated 
LRs excluding patients with perforation because these 
patients did not undergo tympanocentesis. To correct for 
verification bias, we made the conservative assumption that 
children who did not undergo tympanocentesis had nor¬ 
mal-appearing ears (normal color, position, or mobility). 44 
LRs were adjusted by the calculated verification fraction for 
each clinical sign subset (color, position, and mobility). The 
correction for verification bias protects against overly opti¬ 
mistic estimates of the examiner’s ability to rule out AOM 
and overly pessimistic estimates of the ability to rule in 
AOM. Because the color of the tympanic membrane 
appeared to have ordinal properties (eg, normal, slightly 
red, distinctly red, cloudy), we described the overall accu¬ 
racy of this finding by the area under the receiver operating 
characteristic curve. 


RESULTS 

From the 397 references initially identified, we found 6 arti¬ 
cles that satisfied inclusion criteria. This included 1 article 
concerning precision, 4 articles on accuracy of symptoms, 
and 1 article on accuracy of signs (Table 37- 1). 4 - 16,23,35,36,41 

Precision of Symptoms and Signs 

To our knowledge, no studies concerning precision of 
symptoms have been published, and there are only a few 
studies on precision of signs. A comparison of diagnoses 
among practitioners would be important, especially during 
training, when medical students and house staff learn to 
interpret otoscopic findings from their instructors. Recently, 
Steinbach et al 4 compared diagnoses of AOM among pedi¬ 
atric residents with diagnoses made by otolaryngologists. 
Complete examinations were available for 43 children, but 
the study found only fair agreement between the residents 
and the otolaryngologists. Overall agreement on diagnosis 
of AOM between the 2 types of practitioners had a K statis¬ 
tic of 0.30 (fair). K Statistics on tympanic membrane fea¬ 
tures such as erythema, color, effusion, mobility, and 
position were fair to slight (k = 0.40, 0.40, 0.31, 0.21, and 
0.16, respectively). Correlations between pediatric resi¬ 
dents and otolaryngologists comparing tympanometry in 
the detection of an effusion were also fair (k = 0.25 and 
0.30, respectively). 

Accuracy of Symptoms and Signs 

Symptoms 

Sensitivity, specificity, and positive and negative LRs 
derived from articles that examined the role of symptoms 
in the diagnosis of AOM are included in Table 37-2. 16,35 ' 36 - 41 
The presence of ear pain appears to be the symptom most 
useful in making the diagnosis of AOM. Ear pain has a pos¬ 
itive LR (LR+) of 3.0 to 7.3 but is present in only 50% to 
60% of children with AOM. With a baseline prevalence for 


Table 37-1 Studies Meeting Inclusion Criteria for Accuracy of Symptoms and Signs in Diagnosis of Acute Otitis Media 

Source, y 

Evidence 

Level 

No. of 
Patients 

Age 

Range, y 

Criterion 

Standard 

Limitations 

Symptoms 

Niemela et al, 16 1994 

4 

354 

1 mo-15 y 

Clinical diagnosis 

Majority of children examined by specialists 

Children had a high incidence of recurrent acute otitis media 

Not blinded 

Heikkinen and 
Ruuskanen, 35 

1995 

4 

302 

0.6-4.2 y 

Clinical diagnosis 

Not blinded 

Ingvarsson, 36 1982 

4 

171 

0-15 y 

Clinical diagnosis 

Referred to otolaryngologist for otalgia 

Not blinded 

Kontiokari et al, 41 1998 

4 

138 

0.6-6.9 y 

Clinical diagnosis 

Not blinded 

Signs 

Karma et al, 23 1989 

3 

2911 

6 mo-2.5 y 

Tympanocentesis 

All examinations performed by either 1 pediatrician or 1 otolaryngologist 
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AOM of 20% among children aged 5 years or younger who 
make an acute pediatric office visit (estimated from the 
National Ambulatory Medical Care Survey), the presence of 
ear pain increases the probability of AOM to approximately 
43% to 65%. 

Fever is often cited as a primary symptom of AOM 27 ' 28 but 
shows variability in usefulness. One study shows that the 
likelihood slightly increases with a fever, but 2 studies found 
no effect, with the positive LR (LR+) approaching 1.0. The 
absence of fever seems to confer little change in the likeli¬ 
hood of AOM. 

Kontiokari et al 41 examined the ability of parents to pre¬ 
dict whether their child had AOM. Parents were fairly accu¬ 
rate and showed similar ability to predict that their child 
did have AOM (LR+, 3.4) and that their child did not have 
AOM (LR-, 0.4). These findings are partially tempered by 
the fact that the physicians were not blinded to parental 
predictions, and this may have biased their ultimate diag¬ 
noses. We suspect that parents learn from their childrens 
symptoms with each febrile or upper respiratory tract ill¬ 
ness, so that more experienced parents may have better 
diagnostic acumen, but the effect of parental experience on 
accuracy and LR of diagnosing otitis media has not been 
evaluated. Thus, we do not know whether parents of chil¬ 
dren with frequent infections of any type are more or less 


able to accurately assess ear involvement with each child¬ 
hood illness episode. 

A final symptom that deserves mention is ear pulling. Ear 
pulling has long been debated as a possible sign of AOM 
because parents and primary caregivers frequently observe 
this phenomenon. 5 Many physicians have been taught that 
ear pulling is not a useful sign because children pull at their 
ears because “they are there.” In the study by Niemela et al, 16 
“ear rubbing” appeared to have some predictive ability for 
the diagnosis of AOM (LR+, 3.3; 95% Cl, 2.1-5.1). The only 
other study that we know of that has addressed this symptom 
is a small but often-referenced study by Baker, 30 who exam¬ 
ined 100 consecutive children with a chief complaint of ear 
pulling; 20 children had ear pulling as their sole complaint, 
whereas 80 children had other symptoms. Of the 20 children 
with ear pulling as the sole complaint, none met Baker’s 
unspecified criteria for AOM compared with 12 of the other 
80 children. 

Any conclusions about symptoms that can be drawn 
from the studies in Table 37-2 are limited by the study 
designs. Two of the 4 studies 16,36 involve “spectrum bias,” in 
which a spectrum of patients are used who are not repre¬ 
sentative of the population as a whole. Failure to incorpo¬ 
rate an appropriate spectrum of patients can affect the 
sensitivity and specificity of findings. 45 ' 47 In the 2 studies 


Table 37-2 Accuracy of Symptoms 






Source and Symptoms 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 


Niemela et al, 16 1994 

Ear pain 

54 

82 

3.0 (2.1-4.3) 

0.6 (0.5-0.7) 


Ear rubbing 

42 

87 

3.3 (2.1-5.1) 

0.7 (0.6-0.8) 


Fever 

40 

48 

0.8 (0.6-1.0) 

1.2 (1.0-1.5) 


Cough 

47 

45 

0.9 (0.7-1.1) 

1.2 (0.9-1.4) 


Rhinitis 

75 

43 

1.3 (1.1-1.5) 

0.6 (0.4-0.8) 


Excessive crying 

55 

69 

1.8(1.4-2.3) 

0.7 (0.5-0.8) 


Poor appetite 

36 

66 

1.1 (0.8-1.4) 

1.0(0.8-14) 


Vomiting 

11 

89 

1.0 (0.6-1.8) 

1.0(0.9-14) 


Sore throat 

13 

74 

0.5 (0.3-0.8) 

1.2(14-1.3) 


Headache 

9 

76 

0.4 (0.2-0.7) 

1.2(14-1.3) 


Heikkinen and Ruuskanen, 35 1995 

Ear pain 

60 

92 

7.3(4.4-12) 

0.4 (0.4-0.5) 


Fever 

69 

23 

0.9 (0.8-1.0) 

1.4 (0.9-2.0) 


Cough 

84 

17 

1.0 (0.9-1.1) 

1.0 (0.6-1.6) 


Rhinitis 

96 

8 

1.0 (1-1.1) 

0.5 (0.2-1.4) 


Restless sleep 

64 

51 

1.3(14-1.6) 

0.7 (0.5-0.9) 


Ingvarsson, 36 1982 

Ear pain 

100 

NA 

NA 

NA 


Fever 

79 

70 

2.6 (1.9-3.6) 

0.3 (0.2-0.5) 


Upper respiratory tract infection 

96 

29 

1.4 (1.2-1.6) 

0.3 (0.2-0.5) 


Kontiokari et al, 41 1998 

Parental suspicion of AOM 

70 

80 

3.4 (2.8-4.2) 

0.4 (0.3-0.5) 



Abbreviations: AOM, acute otitis media; Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NA, not applicable. 
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identified in this analysis, patients were often treated by 
specialists and had a higher incidence of recurrent AOM or 
chronic OME. These patients may differ from those in pri¬ 
mary care clinics, and this can potentially affect the gener- 
alizability of the results. 

Another significant design limitation in all 4 included 
studies is their use of a clinical diagnosis of AOM, rather 
than tympanocentesis, as the criterion standard. Because 
the diagnosis of AOM potentially requires the presence of 
the symptoms that are being examined, an “incorporation 
bias” can occur when tympanocentesis is not performed as 
a confirmatory test. Incorporation bias 46 typically overesti¬ 
mates sensitivity and specificity. This bias may be further 
exaggerated because examiners who make the diagnosis of 
AOM also elicit the history in a nonblinded fashion. The 
bias created by using a clinical diagnosis as the criterion 
standard should improve the LRs for the symptoms; if that 
is the case, then it is possible that few symptoms would 
prove themselves independently useful in methodologically 
stronger studies. 

Signs 

Table 37-3 presents the results from the only study that has 
examined signs in the diagnosis of AOM. 23 The selective per¬ 
formance of tympanocentesis in this study created verifica¬ 
tion bias, which overestimates sensitivity and underestimates 
specificity and LR+. 45,48 Fortunately, the investigators pro¬ 
vided clinical examination findings for all patients, allowing 
us to correct for verification bias. This study suggests that a 
tympanic membrane that is cloudy (adjusted LR+, 34), bulg¬ 
ing (adjusted LR+, 51), or distinctly immobile (adjusted 
LR+, 31) is highly suggestive of AOM. In contradiction to 


Table 37-3 Accuracy of Signs 23 


Signs 

Unadjusted LR+ 

Adjusted LR+ (95% Cl)“ 

Color 

Cloudy 

11 

34 (28-42) 

Distinctly red b 

2.6 

8.4 (6.7-11) 

Slightly red 

0.4 

1.4 (1.1-1.8) 

Normal 

0.1 

0.2(0.19-0.21) 

Position 

Bulging 

20 

51 (36-73) 

Retracted 

1.3 

3.5 (2.9-4.2) 

Normal 

0.4 

0.5 (0.49-0.51) 

Mobility 

Distinctly impaired 

8.4 

31 (26-37) 

Slightly impaired 

1.1 

4.0 (3.4-4.7) 

Normal 

0.04 

0.2(0.19-0.21) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio. 

“Results reported by Karma et al 23 were calculated by combining data reported from 2 
groups. Results are rounded so that precision is not overstated and results remain 
clinically meaningful. 

““Distinctly red" was described qualitatively as “hemorrhagic, strongly red, or moder¬ 
ately red.” 


what is often taught to physicians in training, a tympanic 
membrane that is distinctly red, defined as “hemorrhagic, 
strongly red, or moderately red,” also suggests otitis media 
(adjusted LR+, 8.4), whereas a drum that is only slightly red 
(adjusted LR+, 1.4) is not helpful. These data suggest that 
color of the tympanic membrane can be treated as an ordinal 
variable, ranging from normal through redness to cloudy 
(Table 37-3), with the likelihood of AOM increasing with the 
intensity of redness (the area under the receiver operating 
characteristic curve as a measure of accuracy of tympanic 
membrane color is 0.88 [SE, 0.003]). 

After correction for verification bias, normal color or nor¬ 
mal mobility make otitis media much less likely (LR = 0.2 for 
both). Given a baseline prevalence of 20% among children at 
an acute office visit, the probability of AOM decreases to less 
than 5% when the tympanic membrane is normal in either 
color or mobility. The independence of the findings of color, 
position, and mobility has not been assessed. Although it 
would seem that abnormalities in 2 or all 3 of these compo¬ 
nents would be more important than the finding of just 1 
abnormality, we cannot quantify the effect of increasing 
numbers of abnormal findings. 

Means of Improvement 

Because AOM is so prevalent in the pediatric population and 
more accurate diagnosis of AOM can potentially lead to a 
decrease in antibiotic use and other costs, the improvement 
of diagnostic skills for AOM is clearly important. This 
improvement can be achieved by using more standardized 
diagnostic criteria and by improving diagnostic skills. A sur¬ 
vey by Rosenfeld 8 suggested that application of the AE1RQ 
recommended criteria for AOM could reduce the rate of 
diagnosis of AOM by more than 20% by excluding cases that 
do not have evidence of a middle ear effusion. 

Tools to improve diagnostic skills include teaching oto¬ 
scopes that have 2 viewing areas, 49 videotapes, mannequin 
models, computer- and Web-based applications, and the use 
of more controlled settings, such as children undergoing 
myringotomy procedures. The American Academy of Pediat¬ 
rics, for example, supports a multimedia “virtual classroom” 
Web site designed to help clinicians improve their skills in the 
diagnosis and treatment of otitis media. 50 

Several studies have documented that clinicians can 
improve their diagnostic accuracy by practicing pneumatic 
otoscopy in children who are scheduled to undergo myringo¬ 
tomy. 37,51 In this setting, clinicians perform ear examinations 
before anesthetization and in the operating room and com¬ 
pare their findings with the results of myringotomy. In addi¬ 
tion, clinicians receive feedback from skilled, previously 
validated otoscopists. Pichichero and Poole 52 have demon¬ 
strated that videotaped pneumatoscopic examinations and 
infant mannequin models may be used to assess and poten¬ 
tially improve accuracy in the diagnosis of AOM and the per¬ 
formance of tympanocentesis. 

Despite studies suggesting that diagnostic accuracy in 
AOM can be improved, current training remains poor. A 
recent survey by Steinbach and Sectish 3 revealed that only 
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59% of pediatric residency programs currently provide a for¬ 
mal curriculum (defined as a “structured and consistent part 
of the residency program, not an occasional occurrence”) for 
training residents in the diagnosis and treatment of AOM. 
The formal curriculum that is provided usually consists of 
fewer than 3 didactic lectures per year, with limited assess¬ 
ment of resident performance. 


CLINICAL SCENARIO—RESOLUTION 


This child is certainly at risk of AOM because he is in the age 
group in which AOM is common, he has a history of AOM, 
and he has had a preceding upper respiratory tract infection. 
None of his presenting symptoms are predictive of AOM. On 
examination, the left ear appears normal. Cerumen is cor- 
rectiy removed from the right ear to better visualize the 
drum. This drum was described by the physician as “slightly” 
red but with normal motility. The precision of diagnosing an 
ear as slightly red as opposed to “distinctly” red is not known; 
however, according to the work of Karma et al, 23 a slightly red 
tympanic membrane does not have a high enough LR to 
independendy confirm AOM, nor does it meet standardized 
criteria for AOM (such as from the AHRQ). The slightly red 
drum may be related to irritation from the cerumen removal 
and is not suggestive of AOM (probability is only 26%, given 
a baseline prevalence of 20% and an LR of 1.4 for a slightly 
erythematous membrane). The normal mobility more defin¬ 
itively suggests that AOM is not likely in this scenario (proba¬ 
bility would be only 5%, with a baseline prevalence of 20% 
and an LR of 0.2). 


THE BOTTOM LINE 

The diagnosis of AOM can be difficult, and studies examin¬ 
ing this condition are somewhat limited. The studies we 
reviewed suggest that ear pain may be an important symp¬ 
tom but that other symptoms are not reliable. Although 
physical examination results are limited by the existence of 
only 1 well-performed study, a tympanic membrane that is 
cloudy, bulging, or distinctly immobile is highly suggestive of 
AOM. The presence of a distinctly red tympanic membrane 
also appears useful, although not as important as cloudiness 
of the tympanic membrane. Children with normal color and 
mobility of their tympanic membranes are much less likely to 
have otitis media than those with abnormalities. The discov¬ 
ery that erythema may be useful contradicts the instruction 
many clinicians receive and therefore deserves further study. 

Many of the studies on the accurate diagnosis of AOM are 
limited by spectrum bias that affects generalizability and by 
lack of an acceptable criterion standard. These limitations are 
difficult to overcome. For example, it would be difficult to 
design a study in which tympanocentesis can be performed 
in children with a low suspicion for AOM. On the other 
hand, including data on all patients, as in the study by Karma 
et al 23 (Table 37-1), allows investigators to conduct practical 
studies with correction for verification bias that improves 
their validity. Future studies can be improved by using a gen¬ 


eral population of at-risk children, more standardized diag¬ 
nostic criteria, and independent examinations by blinded 
examiners. Studies also need to assess the precision and accu¬ 
racy of characterizing physical findings, as Karma et al 23 have 
done, in an ordinal rather than dichotomous manner (eg, 
describing color as normal, slightly red, or distinctly red 
rather than just normal vs red). Because we do not know the 
relative importance of multiple abnormal findings vs 1 
abnormal finding, an assessment of the independent impor¬ 
tance of color, position, and mobility would allow clinicians 
to properly weigh the relative importance of these findings 
and, perhaps, lead to the development of a grading scheme 
that permits more accurate estimates of the likelihood of oti¬ 
tis media. 

Despite the limitations of the current studies, we recom¬ 
mend that pneumatic otoscopy be performed when otitis 
media is considered to assess not just drum color and 
appearance but also mobility. Clinicians need to appreciate 
the amount of uncertainty in the diagnosis of AOM and how 
this may contribute to their decision to treat or not treat with 
antibiotics. Standard criteria for AOM, such as the AHRQ 
guidelines, which include the detection of a middle ear effu¬ 
sion, should also be considered because these can result in 
more uniform diagnoses and, it is hoped, decrease the rate of 
overdiagnosis. The use of training videos and other tech¬ 
niques may improve physical examination performance, but 
this will be more helpful after more studies have established 
the relationship between signs and the diagnosis of AOM. 
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Otitis Media, Child 
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Reviewed by Sheri Keitz, MD 


CLINICAL SCENARIO 


A 12-month-old boy is brought to the clinic by his par¬ 
ents for evaluation of a possible ear infection. The par¬ 
ents state that the child has had fever, with temperature 
to 38.5°C, moderate irritability, and decreased appetite. 
They also note that the child has had 2 previous ear 
infections and that he is now acting as he did when he 
had the previous infections, raising their concern about 
another possible ear infection. On examination, the child 
is moderately irritable and crying. His left tympanic 
membrane appears to be grey, with good mobility on 
pneumatic otoscopy. His right tympanic membrane dis¬ 
plays both distinctly red and distinctly impaired mobility 
on pneumatic otoscopy. 

UPDATED SUMMARY ON ACUTE OTITIS MEDIA 

Original Review 

Rothman R, Owens T, Simel DL. Does this child have acute 
otitis media? JAMA. 2003;290(12):1633-1640. 

UPDATED LITERATURE SEARCH 

We repeated the literature search through April 2006 on acute 
otitis media and reviewed 66 new potential titles, of which 14 
were promising enough for further evaluation. We attempted 
to identify prospective studies of children evaluated for possi¬ 
ble acute otitis media; of the 14 titles, 8 full articles were 
retrieved and 1 was retained. 1 Because tympanocentesis is the 
best reference standard, we cross-checked the original 66 arti¬ 
cles with the text word “tympanocentesis” and found 2 arti¬ 
cles that described a clinical score using a combination of 
findings. 2,3 We retained only the earlier of these 2 articles 
because it presented data in a fashion that allowed a calcula¬ 
tion of likelihood ratios (LRs). 

NEW FINDINGS 

Healthy children who cry are unlikely to have red tympanic 
membranes (<3%). 


Details of the Update 

In a study of healthy infants and toddlers receiving injections for 
vaccinations, the color of the tympanic membrane was com¬ 
pared before and after the injection (which, together with the 
original examination, often induced crying). 1 On examination, 
clinicians rated the degree of crying and the color of the tym¬ 
panic membrane on an 8-point scale from shades of clear or grey 
to pink and red. Although the clinicians undoubtedly knew the 
study hypothesis and could have been biased, the color was com¬ 
pared with a standardized chart developed from a digital com¬ 
puter program. Crying healthy children almost never had 
“lightly red” ears (2/242 tympanic membranes; 95% confidence 
interval [Cl], 0%-3%) and none had frankly red ears. On the 
other hand, the tympanic membranes of children who cried 
more during the second examination had a greater likelihood of 
a 2-color increment increase toward faint pink/pink compared 
with those of children who did not have increased crying (19% 
vs 5%). 

A large group of children at an Israeli medical center under¬ 
went tympanocentesis as part of an antibiotic clinical trial. 2 The 
investigators created a clinical score ( ) with face valid¬ 

ity, though it has not been validated. The score combined the 
results of the child’s temperature, parental assessment of irrita¬ 
bility, and tugging of the ears, together with the clinician’s assess¬ 
ment of redness and bulging membranes. Each item was graded 
on a 0 to 3 ordinal scale and summed across domains to create 
the clinical score (range, 0-15). The reference standard was the 
result of middle ear fluid culture, which was positive for 75% of 
children (Haemophilus influenzae or Streptococcus pneumoniae), 
but 37% of the patients had received antibiotics before the tym¬ 
panocentesis. This bias could misclassify affected patients as not 
having otitis media, but it is impossible to know whether sensi¬ 
tivity or specificity is more affected because antibiotics could 
also affect the clinical score. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

None. 

CHANGES IN THE REFERENCE STANDARD 

None. 










CHAPTER 37 Update 


Table 37-4 Unvalidated Clinical Score for Acute Otitis Media 3 


Score 

Temperature (°C) 

Irritability 

Tugging 

Redness 

Bulging 

0 

<38 

Absent 

Absent 

Absent 

Absent 

1 

38-38.5 

Mild 

Mild 

Mild 

Mild 

2 

38.6-39 

Moderate 

Moderate 

Moderate 

Moderate 

3 

>39 

Severe 

Severe 

Severe 

Severe 


“The score is obtained by summing the individual scores for each finding. A maximum 
score is 15. When both ears are involved, use the score from the worst ear. 


RESULTS OF LITERATURE REVIEW 

Univariate Findings for Acute Otitis Media 

We found no new valid data on the LR of individual symp¬ 
toms and signs for acute otitis media. 

Multivariate Findings for Acute Otitis Media 

Overall, clinical scores (see Table 37-4) greater than or equal 
to 9 had an LR of 1.3 (95% Cl, 1.0-1.7), 6 to 8 had an LR of 
0.94 (95% Cl, 0.70-1.3), and scores less than or equal to 5 
had an LR of 0.32 (95% Cl, 0.16-0.62), which suggests that 
the score is not particularly useful in diagnosing acute otitis. 
The utility of the clinical score may have been diluted by the 
inclusion of clinical signs in the score. Only tympanic red¬ 
ness and bulging had individual scores that were statistically 
greater among those with positive culture results vs negative 
culture results. The importance of tympanic redness or bulg¬ 
ing fits with the criteria recommended by the Agency for 
Healthcare Research and Quality as reported in the original 
Rational Clinical Examination article on acute otitis media. 4 
In the study, more than 70% of patients had acute otitis, 
which is much greater than expected in the normal popula¬ 
tion. This spectrum of patients, with high rates of otitis, 
could overestimate the sensitivity and underestimate the 
specificity of the scale. The data suggest that the clinical score 
from Table 37-4 may have too many variables because tem¬ 
perature, irritability, and tugging did not differ between 
patients with culture-positive and culture-negative results. 


EVIDENCE FROM GUIDELINES 

The American Academy of Pediatrics and the American Acad¬ 
emy of Family Practice released new guidelines on the diagno¬ 
sis and treatment of acute otitis media. 5 These guidelines for 
diagnosing acute otitis are similar to the guidelines sponsored 
by the Agency for Healthcare Research and Quality. 4 


CLINICAL SCENARIO—RESOLUTION 


The prevalence of acute otitis media in children aged 0 
to 2 years at ambulatory visits is about 20%. The paren¬ 
tal suspicion of acute otitis media may be modestly help¬ 
ful; with an LR of 3.4, this would make the posttest 
probability of acute otitis 46%. The distinctly red tym¬ 
panic membrane is likely not just related to crying and 
suggests an acute infection; with an LR of 8.4, this sign 
increases the posttest probability to 68%. A distinctly 
impaired tympanic membrane mobility on pneumatic 
otoscopy is the most helpful finding (LR, 31) and raises 
the posttest probability to 89%. The combination of 
these symptoms and signs may make acute otitis media 
even more likely. 


REFERENCES FOR THE UPDATE 
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693-697. a 

2. Leibovitz E, Satran R, Piglansky L, et al. Can acute otitis media caused by 
Haemophilus influenzae be distinguished from that caused by Streptococ¬ 
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3. Polachek A, Greenberg D, Lavi-Givon N, et al. Relationship among 
peripheral leukocyte counts, etiologic agents and clinical manifestations 
of acute otitis media. Pediatr Infect Dis /. 2004;23(5):406-413. 

4. Marcy M. Management of Acute Otitis Media. Rockville, MD: Agency for 
Healthcare Research and Quality; 2001:1-159. 

5. American Academy of Pediatrics Subcommittee on Management of 
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a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 
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ACUTE OTITIS MEDIA—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The prevalence of clinically diagnosed otitis media is high, 
with a rate of 17.4% for all visits to US pediatricians for 
0- to 24-month-old children, 18% for 2- to 5-year-old 
children, 10% for 6- to 10-year-old children, and 5.2% for 
11- to 15-year-old children. A baseline prevalence of 20% 
is a reasonable anchor for child visits to the emergency 
department. 

POPULATION FOR WHOM ACUTE 
OTITIS MEDIA SHOULD BE CONSIDERED 

The diagnosis should be considered for a child complain¬ 
ing of ear symptoms. Among infants, the rapid onset of 
ear pulling, ear drainage, irritability, or fever should 
prompt an otoscopic evaluation for otitis media. 

DETECTING THE LIKELIHOOD 
OF ACUTE OTITIS MEDIA 

Healthy children who cry before and during the examina¬ 
tion are unlikely to have distinctly red tympanic mem¬ 
branes. Therefore, discovering red tympanic membranes 
should not be attributed solely to crying. The most useful 
findings are tympanic membrane color, mobility, and 
position (Table 37-5). 

REFERENCE STANDARD TESTS 

Tympanocentesis is the reference standard, but most stud¬ 
ies use a standardized clinical definition. 


Table 37-5 Detecting the Likelihood of Acute Otitis Media 

Signs 8 LR+ (95% Cl) 

Tympanic membrane color 

Cloudy 

34 (28-42) 

Distinctly red 

8.4(6.7-11) 

Slightly red 

1.4 (1.1-1.8) 

Normal 

0.20 (0.19-0.21) 

Tympanic membrane mobility 

Distinctly impaired 

31 (26-37) 

Slightly impaired 

4.0 (3.4-4.7) 

Normal 

0.20 (0.19-0.21) 

Tympanic membrane position 

Bulging 

51 (36-73) 

Retracted 

3.5 (2.9-4.2) 

Normal 

0.50 (0.49-0.51) 

Symptoms LR+ (95% Cl) or Range 

LR- (95% Cl) or Range 

Parental suspicion of otitis 

3.4 (2.8-4.2) 

0.4 (0.3-0.5) 

media 



Ear rubbing 

3.3 (2.1-5.1) 

0.7 (0.6-0.8) 

Ear pain 

3.0-7.3 

0.4-0.6 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

a LRs adjusted for verification bias. 

"Assessed with pneumatic otoscopy. 
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TITLE Can Acute Otitis Media Caused by Haemophilus 
influenzae be Distinguished From That Caused by Strepto¬ 
coccus pneumonia ? 

AUTHORS Leibovitz E, Satran R, Piglansky L, et al. 

CITATION PediatrInfectDis. 2003;22(6):509-514. 

QUESTION Does a previously proposed clinical score 
based on symptoms and signs accurately identify infants 
and young children with acute otitis media? 

DESIGN Prospective, nonconsecutive enrollment. 

SETTING The study took place in an Israeli pediatric 
emergency department. All reported study subjects were 
enrolled in various antibiotic efficacy trials. 

PATIENTS Infants and children (aged 3 to 36 months) 
treated in the emergency department during a 5-year 
period who had (1) symptoms of acute otitis (parental 
report of fever, irritability, and ear tugging) and signs 
(redness or bulging of the tympanic membrane); (2) ill¬ 
ness duration less than or equal to 7 days; and (3) no tym¬ 
panostomy tubes or spontaneous perforation of at least 24 
hours. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

All children underwent an examination by an otolaryngolo¬ 
gist who also performed a tympanocentesis. The score was 
based on a previously reported score, summed across 5 find¬ 
ings, with a maximum score of 15. 1 When both ears were 
involved, the score for the worse ear was used ( le 37- ). 


Table 37-6 Clinical Score for Acute Otitis Media 



Score 

Temperature (°C) 

Irritability 

Tugging 

Redness 

Bulging 

0 

<38 

Absent 

Absent 

Absent 

Absent 

1 

38-38.5 

Mild 

Mild 

Mild 

Mild 

2 

38.6-39 

Moderate 

Moderate 

Moderate 

Moderate 

3 

>39 

Severe 

Severe 

Severe 

Severe 


MAIN OUTCOME MEASURES 

The means of the clinical score were calculated for patients 
according to an infection with Haemophilus influenza, 
Streptococcus pneumonia, or mixed infections. Viral cul¬ 
tures were not performed. 

MAIN RESULTS 

Of 372 enrolled study subjects with complete data ( ), 

96% were younger than 2 years. The culture results were nega¬ 
tive in 94 (25%) patients, but only 63% of patients had not 
received antibiotics in the preceding 72 hours. 

Culture-positive patients had a higher clinical score than 
culture-negative patients (9.3 vs 8.4; P = .01). There was no 
difference in score between culture-positive and culture-neg¬ 
ative children treated with antibiotics, whereas those who 
were antibiotic naive showed a statistical difference (culture¬ 
positive score 9.11 vs culture negative score 8.1; P = .02 [con¬ 
fidence intervals not provided]). The authors evaluated age as 
a second confounder. The differences in culture-positive 
patients’ score vs culture-negative patients only barely 
reached statistical significance in infants aged 3 to 6 months 
(8.9 ± 2.7 vs 7.4 ± 2.0; P = .05). Among older children, the 
scores were not statistically different between culture-positive 
and culture-negative results for patients. The scores did not 


Table 37-7 Clinical Data of the Patients 


Received Previous Received no Previous 
All Patients Antibiotics 3 Antibiotics 3 


Score 

No. 

LR+ 

(95% Cl) 

Prevalence, 

% 

LR+ 

(95% 

Cl) 

Prevalence, 

% 

LR+ 

(95% 

Cl) 

>9 

201 

1.3 (1.0-1.7) 

65 

1.10 

(0.83-1.50) 

49 

1.5 

(1.0-2.3) 

6-8 

140 

0.94 

(0.70-1.30) 

26 

1.40 

(0.70-2.80) 

43 

0.84 

(0.61-1.20) 

<5 

31 

0.32 

(0.16-0.62) 

9 

0.26 

(0.09-0.75) 

8 

0.34 

(0.14-0.82 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio, 

“Eugene Leibovitz, MD, kindly provided the data that allowed calculation of the likeli¬ 
hood ratios for those who received antibiotics and those who were antibiotic naive. 
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distinguish between bacterial etiologies, which was an out¬ 
come unaffected by antibiotic exposure. When the individual 
components of the score were analyzed, the scores for red¬ 
ness and bulging were the only results statistically higher for 
culture-positive vs culture-negative patient results. Tympanic 
membranes were redder in patients with H influenza (P = 
.001) or S pneumonia (P = .05) compared with those with 
negative cultures. Similarly, tympanic membranes bulged 
more in children who were culture positive (H influenza, 
P < .001; S pneumonia, P = .04). 

Because antibiotics could have affected the clinical findings, 
we calculated the likelihood ratios from data provided by the 
author. Among children who had not received antibiotics, 
78% had a culture-positive result. There was no clinical differ¬ 
ence in the likelihood ratios at any of the 3 specified thresh¬ 
olds, whether or not the child received antibiotics (Table 37-7). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS All these children received tympanocentesis 
and culture. 

LIMITATIONS All the children were referred for enrollment 
in antibiotic treatment trials of otitis media. 

With a 78% probability of otitis among antibiotic-naive 
patients, the clinicians were effectively identifying the chil¬ 
dren most likely to have acute otitis media. The dispropor¬ 
tionately high prevalence of children with scores greater than 
or equal to 9 is appropriate for a randomized clinical trial 
enrolling children according to their clinical diagnosis. How¬ 
ever, it creates verification bias when using the data to deter¬ 
mine the accuracy of the clinical diagnosis because children 
would have been referred for tympanocentesis only when 
they were highly likely to have otitis media. Typically, sensi¬ 
tivity is overestimated and specificity is underestimated with 
verification bias. In addition to verification bias, higher clini¬ 
cal scores may have led to earlier initiation of antibiotics. 
Among children who did not receive previous antibiotics, the 
clinical score provides little information. With a prevalence 
of 78% culture positive, a clinical score greater than or equal 
to 9 increases the probability of acute otitis media to 84%, 
whereas a clinical score less than or equal to 5 decreases the 
probability to 55%. 

In other research, only redness and bulging have been 
shown as useful for diagnosing otitis media. The scoring sys¬ 
tem includes 3 other measures, none of which were statisti¬ 
cally significant as independent items. By including those 
measures in the score, the clinical efficiency of the score sys¬ 
tem might have been decreased. 

Reviewed by David L. Simel, MD, MHS 


REFERENCE FOR THE EVIDENCE 

1. Dagan R, Leibovitz E, Greenberg D, Yagupsky P, Fliss DM, Leiberman A. 
Early eradication of pathogens from middle ear fluid is associated with 
improved clinical outcome. Pediatr Infect Dis J. 1998;17(9):776-782. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A physician examined the child’s ears before immunization and 
rated the child’s degree of crying and color of each tympanic 
membrane. Crying was assessed on an ordinal 0 to 4+ scale (0 = 
“no crying at all”; 4 = loudest and most intense crying ever 
heard). Color was rated on a 1 to 9 scale according to a sheet 
with color images produced by Adobe Photoshop (Adobe Sys¬ 
tems, Inc, San Jose, California) with standardized red-blue- 
green values. The colors were described as no color, light gray, 
gray, faint pink, pink, darker pink, light red, red, and “can’t see 
the tympanic membrane.” A second independent examiner 
repeated the examination after the immunization. 

MAIN OUTCOME MEASURES 

Change in crying was compared to the color of the tympanic 
membrane perceived by the physician. The color had to be 
rated as pink or more red (>5 on color scale) to be consid¬ 
ered an “increase in redness.” 


MAIN RESULTS 

Of the 121 children, 53 were not crying during the first exami¬ 
nation compared with only 17 who were not crying during the 
second examination. At the second examination, 64 subjects 
were rated as crying the same or less compared with the rating 
by the first examiner. Only 2 tympanic membranes of the 242 
examined ears exhibited light redness and none were frankly 
red. When children were crying more during the second exam- 


TITLE Does Crying Turn Tympanic Membranes Red? 

AUTHORS Yamamoto LG, Sumida RN, Yano SS, Derauf 
DC, Martin PE, Eakin PJ. 

CITATION Clin Pediatr. 2005;44(8):693-697. 

QUESTION Among healthy infants and toddlers sub¬ 
jected to immunizations, does crying affect the color of 
the tympanic membrane? 

DESIGN Prospective, convenience sample. 

SETTING Pediatrics office. 

PATIENTS Infants and toddlers (age < 30 months) at 
routine healthy child checks during which immunizations 
were given. 
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ination, 19% had increased redness by 2 categories or more vs 
5% of those crying the same or less (P < .001). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Independent examiners. 

LIMITATIONS The validity of the study depends on the col¬ 
ors chosen. However, standardizing the colors from a com¬ 
puter program is likely better than a subjective assessment of 
redness. The examining physicians were not blinded, so they 
likely knew the study hypothesis. The analysis was based on 
the number of ears examined rather than children examined, 
which increased the apparent sample size and might have 
inflated the statistical significance. Approximately 18 physi¬ 


cians were used in 5 clinic sites, and a different physician per¬ 
formed the initial and second examinations. For better 
assessment of the validity of the color scale, it would have 
been helpful if the authors had assessed the level of physician 
agreement with a K statistic. 

The authors include a good discussion of some of the limita¬ 
tions to generalizability of their results. However, the near 
absence of red tympanic membranes strongly suggests that 
healthy children who cry do not develop frankly red tympanic 
membranes, though they can display pinkish hues. The 
authors appropriately noted that sick, febrile children might 
have higher rates of greater-intensity crying and more flushing 
even though fever is not a strong predictor of otitis media. 

Reviewed by David L. Simel, MD, MHS 
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CLINICAL SCENARIO 


A 68-year-old man presents with a 3-month history of 
right arm tremor at rest. His movements have been slower 
and he has difficulty getting out of a chair. Physical exam¬ 
ination reveals rigidity in the upper limbs. He walks with 
small steps and has limited ability to swing his arms. His 
facial expressions are limited. 


WHY ANSWER THIS QUESTION WITH 
A CLINICAL EXAMINATION? 


With a prevalence estimated between 150 and 200 per 
100000, Parkinson disease (PD) is one of the most common 
neurologic disorders. 1 It is more prevalent in older persons, 
affecting 1% of those older than 65 years and 2% of those 
older than 85 years. 2 

Although common, the diagnosis of PD is challenging. 
Laboratory tests are not available and conventional imaging 
studies are not helpful. The best reference standard is, unfor¬ 
tunately, neuropathologic (depletion of brain stem pig¬ 
mented neurons and proliferation of Lewy bodies). 3 Serial 
neurologic evaluation with or without concomitant treat¬ 
ment can also be used. 4 The response to an acute levodopa 
challenge has been used as a diagnostic tool. This test is prob¬ 
lematic for a number of reasons: its sensitivity and specificity 
are low, acute levodopa administration is associated with sig¬ 
nificant adverse effects, there is lack of agreement on what 
constitutes a threshold response, and the test is expensive and 
inconvenient. 5 The clinical examination, therefore, is the 
basis for initial diagnosis. Classic clinical features of PD 
include tremor at rest, bradykinesia, rigidity, and postural 
instability. 

There is evidence that the accuracy of diagnosis in some 
settings is improving. In a 1991 study by Rajput et al 6 among 
41 patients diagnosed clinically with PD by neurologists, the 
disease was confirmed neuropathologically at autopsy in 31 
(positive predictive value [PPV] of 76%). Hughes et al 7 eval¬ 
uated the accuracy of clinical diagnosis among 100 patients 
with PD, 86 of whom were followed up by neurologists, 7 by 
geriatricians, and 7 by internists. The diagnosis was con¬ 
firmed at autopsy in 90 persons (PPV = 90%). Another study 
confirmed PD at autopsy among 72 of 73 patients (PPV = 
99%) followed up by neurologists affiliated with a highly spe¬ 
cialized movement disorders center. 8 Despite these improve¬ 
ments and impressive results, it is important to keep in mind 
that the clinical diagnoses in these studies were often made 
during a long period and by physicians with a great deal of 
expertise and experience. The accuracy of clinical diagnosis 
in other settings is unclear. PD is still mistaken for other neu¬ 
rologic disorders. The most frequent misdiagnoses include 
progressive supranuclear palsy, multisystem atrophy (MSA) 
(encompasses the diagnoses Shy-Drager syndrome, olivo- 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 


505 








CHAPTER 38 The Rational Clinical Examination 


pontocerebellar atrophy, and striatonigral degeneration), 
and dementia with Lewy bodies. 6 The differential diagnosis 
also includes essential tremor and vascular pseudoparkin¬ 
sonism. Mistaking PD for other conditions can lead to inap¬ 
propriate and ineffective treatment. Although a patient with 
essential tremor, for example, may benefit from a [3-blocker, 
this treatment would have no effect on the tremor of PD. 
Inappropriate treatment based on misdiagnosis also delays 
the use of dopaminergic medications, which can decrease the 
severity of symptoms and disability. 9 

Mistaking other disorders for PD is also harmful. Dyskine¬ 
sias, for example, appear in 15% to 85% of persons within 5 
years of treatment with levodopa, and hallucinations occur 
in 20% of patients. 10 There is also evidence that levodopa 
causes damage to dopamine neurons, leading to accelerated 
dopamine degeneration. 5 Whether the initial diagnosis is 
correct or not, the disease has serious social and psychologi¬ 
cal consequences. 11,12 In summary, the clinical examination is 


Box 38-1 Typologic Classification of Tremors 

REST TREMOR 

Tremor occurring in a body part that is not voluntarily acti¬ 
vated and when it is supported completely against gravity. 

ACTION TREMORS 

Postural 

Tremor that occurs while voluntarily maintaining a posi¬ 
tion against gravity. 

Kinetic 

Tremor occurring during any voluntary movement. 

1. Simple. Tremor occurring during voluntary movements 
that are not target-directed. 

2. Intention. Tremor whose amplitude increases during 
visually guided movements (eg, finger-to-nose test). 

3. Task-Specific. Tremor that appears or is exacerbated by 
specific tasks (eg, writing). 

4. Isometric. Tremor that occurs during voluntary muscle 
contraction against a rigid stationary object (eg, 
squeezing examiner’s hand). 


Box 38-2 Three Common Tremor Syndromes 
TREMOR OF PARKINSON DISEASE 

Slow frequency (4-6/s) tremor at rest. Tremor inhibited 
during movement and sleep. Aggravated by emotional 
distress. “Pill-rolling quality.” 

CLASSIC ESSENTIAL TREMOR 

Bilateral, usually symmetric postural or kinetic tremor. Fam¬ 
ily history of tremor is common. Attenuated by alcohol. 

PHYSIOLOGIC TREMOR 

Present to differing degrees in all subjects. Enhanced form 
is easily visible, mainly postural, and has a high frequency 
(8-12/s). No evidence of underlying neurologic disease. 
Cause is usually reversible (eg, caffeine). 


important in suspected PD because no laboratory or radio- 
logic tests are helpful diagnostically. Misdiagnosis of PD is 
associated with adverse effects. 

Pathophysiologic Characteristics 

It is important to distinguish between PD and parkinsonism. 
Parkinsonism refers to any clinical syndrome in which 2 or 
more features are present such as tremor, rigidity, and brady- 
kinesia. Parkinson disease is a form of primary or idiopathic 
parkinsonism. Viral infections, environmental toxins, oxida¬ 
tive stress, and heredity have all been suspected as causes. 13 
Secondary or acquired parkinsonism has a variety of causes, 
including head trauma, cerebrovascular disease, and hydro¬ 
cephalus. 14,15 Secondary parkinsonism may persist for months 
after the drugs that caused it are discontinued. A thorough 
inquiry into past and current medication use, therefore, is 
essential when questioning patients presenting with parkin¬ 
sonism. Parkinson disease begins as neurons and dopamine 
are lost from the substantia nigra and intracytoplasmic inclu¬ 
sions (Lewy bodies) appear. Symptoms appear when 70% to 
80% of dopamine is lost. 16 

Symptoms and Signs 

Nonspecific insidious symptoms, including generalized mal¬ 
aise, easy fatigability, and subtle personality changes, mark 
the onset of PD. These may occur years before the appear¬ 
ance of tremor, limb rigidity, bradykinesia, and postural 
instability. 16 Numerous secondary manifestations appear 
unpredictably and are as varied as disordered sleep (42% of 
patients), 17 constipation (50%), pain (50%), depression 
(40%), and dementia (20%). 16,18 Signs typically begin unilat¬ 
erally and then progress asymmetrically. 

James Parkinson described the combination of tremor 
and bradykinesia as a shaking palsy. 18 Seventy-five percent 
of patients complain initially of a tremor that usually 
occurs at rest in an upper extremity and is characterized 
by visible oscillations with a frequency of 4 to 6 per sec¬ 
ond. Tremor appears intermittently, disappearing during 
sleep and increasing in severity during times of emotional 
distress or anxiety. It is often described as pill-rolling, 
because a rhythmic movement is observed in the hand as 
the index finger flexes and extends against the thumb 
repetitively. 19 

Some basic features distinguish the tremor of PD from 
physiologic and essential tremors (Boxes 38- and 38-2). 20,21 

Rigidity, refers to an involuntary stiffness of the skeletal 
muscles and is a common sign. Electromyogram assessment 
of parkinsonian patients reveals an alternating discharge pat¬ 
tern in opposing muscle groups, even at rest (eg, triceps and 
biceps). Resistance to movement of limbs may be smooth or 
interrupted. Cog wheeling refers to the jerky motion of limbs 
as constant force is applied across a joint, which is similar to 
the ratcheting of the cogs of gears as they click. 22 Unlike rigid¬ 
ity, spasticity refers to a selective increase of tone of flexor 
muscles in the arms and extensor muscles in the legs 23 and 
suggests a diagnosis other than PD. 
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Bradykinesia refers to the overall slowing of active move¬ 
ment or slowness in initiating movement. The initial surge of 
motor activity is inadequate and movements are fragmented 
into a series of incremental steps. Postural instability in 
patients with PD presents as changes in gait and balance. 
Short and shuffling steps are often accompanied by festina- 
tion. Loss of arm movements commonly appears. The 
patient may walk with the arms straight down, rather than 
swinging them back and forth. Gait disturbance is the major 
cause of disability in many patients. As postural reflex mech¬ 
anisms are lost, patients become stooped and have a ten¬ 
dency to fall. Those with severe deficits are sometimes 
confined to a wheelchair or bed. 

How to Elicit Signs 

Tremor 

Tremor can be defined as any rhythmic, involuntary oscilla¬ 
tory movement of a body part. The tremor classification is 
complex and has overlapping features in different disease syn¬ 
dromes. Nevertheless, the Movement Disorder Society has 
developed a classification system to help clinicians distinguish 
tremor types. 24 Tremors can be classified as rest or action. 

The classification system divides tremors into 11 syn¬ 
dromes. Patients with PD typically have a slow (frequency of 
about 4-6/s) tremor at rest. It is easily observed by having 
patients position their hands on their lap. Physicians should 
be able to identify the key features of PD and essential and 
physiologic tremors (Box 38-2). 


Precise measurement of tremor frequency and amplitude is 
sometimes used in diagnostic evaluation. This requires special 
devices and is beyond the scope of the clinical examination. 

Rigidity 

Involuntary muscle stiffness or rigidity can be shown if 
resistance to passive movement of the limbs is detected. 
With the patient relaxed, the examiner places his or her 
thumb across the antecubital fossa with one hand while 
passively flexing and extending the elbow several times with 
the other hand. Rigidity often increases with repeated flex¬ 
ion and extension movements. With cog wheeling, the 
examiner feels alternate periods of resistance and relax¬ 
ation. With lead-pipe rigidity, the examiner feels smooth 
but increased muscle tone throughout passive flexion and 
extension. 25 Rigidity and cog wheeling may be felt in other 
large joints, but if detected in the arms, there is no need to 
confirm their presence elsewhere. Many patients with 
essential tremor manifest a rhythmic resistance to passive 
movements of a limb while there is voluntary action of 
another body part. This is not true cog wheeling but is 
known as Froment sign, which also appears in PD patients. 26 

Bradykinesia 

Bradykinesia refers to a decrease in the speed and ampli¬ 
tude of complex movements. Jobbagy et al 27 described 4 
maneuvers designed to detect bradykinesia: tapping the fin¬ 
gers, twiddling, pinching and circling, and tapping with the 
heel (Figure 38-1). Twiddling refers to repeated rotation of 




|b Twiddling 


|~D~| Tapping with the heel 


A Tapping the fingers 


[c] Pinching and circling 

Pinching 
movement 


Figure 38-1 Maneuvers to Detect Bradykinesia 
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the hands in front of the body. The pinching and circling 
test is a sequence of 6 movements: pinching (opposing 
thumb and index finger) with the right hand and then with 
the left hand; circling (rotating the hand in a circle) with 
the right hand and then with the left hand; pinching with 
the right hand while simultaneously circling with the left; 
and pinching with the left hand while simultaneously cir¬ 
cling with the right (Figure 38-1). Jobbagy et al 27 were able to 
quantify the performance of patients on these tasks by 
using a motion analyzer, although a specific threshold 


/ 



Figure 38-2 Glabella Tap Test 


“score” to define bradykinesia was not determined. How¬ 
ever, poor performance of these maneuvers is easily detect¬ 
able and clinicians can use them to confirm the presence of 
bradykinesia subjectively. 

Glabella Tap Reflex 

This reflex is tested by percussion of the forehead with the 
examiner’s index finger or by pulling a fold of skin 
between the thumb and index finger on the temple lateral 
to the external canthus and tapping with the thumb. The 
orbicularis oculi muscle reflexively contracts, causing 
both eyes to blink. The reflex blinking normally stops after 
tapping is repeated 5 to 10 times. Persistent blinking is a 
positive response sometimes referred to as Myerson sign. 28 
Care should be taken to keep the examiner’s finger above 
the patient’s eyes to avoid blinking in response to visual 
threat (Figure 38-2). 

Are These Features Found in Other Diseases? 

The symptoms and signs of idiopathic PD overlap with those 
of other neurologic diseases, including MSA and progressive 
supranuclear palsy. 

Like PD, MSA often presents with asymmetric rigidity and 
akinesia, but only a minority of patients have a resting 
tremor. 29 Half of patients with MSA present with autonomic 
dysfunction and cerebellar symptoms, and one-quarter dem¬ 
onstrate a transient response to levodopa. 25,29 Similarly, 


Table 38-1 Grade C Studies Included for Review 3 


Source 

No. of 
Subjects 

Age, y, Mean 
(Range) 

Patient Population 

Reference Standard for 
Diagnosis of PD 

Reason Study Not Grade A 

Hughes et al 34 

100 

64(31-85) 

Diagnosed clinically as having PD 

Autopsy findings of depletion of 
nigral pigmented neurons and 
proliferation of Lewy bodies 

Significant selection bias because patients 
studied were clinically diagnosed as having 
PD 

Wenning et al 35 

138 

61 (NA) 

Diagnosed clinically as having PD 
or MSA 

Autopsy findings consistent 
with PD or MSA 

Significant selection bias because patients 
studied were clinically diagnosed as having 
PD or MSA 

Pearce et al 36 

100 

48 (NA) 

Unselected inpatients and outpa¬ 
tients diagnosed as having PD and 
controls without known neurologic 
disease 

Detailed neurologic examination 

Samples of patients who obviously have 
the condition; comparisons nonindepen- 
dent; small sample size 

Duarte et al 37 

128 

66 (30-89)" 

Patients attending a movement dis¬ 
orders polyclinic for the first time 

Detailed neurologic evaluation 

Convenience sample including many 
individuals likely to have PD; small 
sample size 

Mutch et al 38 

123 




Nonindependent comparisons with 

Cases 


75 (57-89) 

35 Diagnosed as having PD 

Unclear standard used 

unclear standard; samples of patients 
who obviously have the condition; small 
sample size 

Controls 


73 (71-76) 

88 From general practices 

Neurologic evaluation 

Meneghini et al 39 

108 

NA 

87 Inpatients with neurologic dis¬ 
orders and 21 patients without 
known neurologic disease 

Detailed neurologic evaluation 

Samples of patients who obviously have 
the condition (including many individuals 
likely to have PD and controls); small 
sample size; prone to observer bias 


Abbreviations: MSA, multisystem atrophy: NA, not available; PD, Parkinson disease. 
“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“For 37 patients diagnosed as having PD only. 
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patients with progressive supranuclear palsy seldom present 
with tremor. Rigidity and postural instability, however, are 
common. 30 

Parkinsonism is sometimes also a feature of Alzheimer dis¬ 
ease. 14 However, Alzheimer disease is easy to distinguish from 
PD because other features are much more prominent. Fur¬ 
thermore, unlike in PD, cognitive impairment is present at 
the onset of Alzheimer disease. 

METHODS 

Four of the authors (G.R., L.F., T.O., and C.E.) performed 
independent searches of the MEDLINE database (1966-2001), 
using a number of Medical Subject Headings (“exp tremor,” 
“exp PD,” “essential tremor”) combined with the search 
terms and strategy used for The Rational Clinical Examina¬ 
tion series. 31 

All relevant articles were retrieved. The resulting set of 
articles was divided into 3 parts, each of which was reviewed 
by a pair of authors. The reference lists of all articles were 
also carefully searched for additional articles. Articles were 
included for study if they met the following criteria: dealt 
primarily with the diagnosis of PD; included patients pre¬ 
senting with 1 or more typical parkinsonian symptoms or 
signs (eg, tremor, rigidity); final diagnosis confirmed by a 
suitable criterion standard, such as serial or detailed neuro¬ 
logic evaluation or pathologic confirmation at autopsy; and 
contained original data from which 2x2 tables could be 
extracted to calculate the sensitivity, specificity, positive like¬ 
lihood ratio (LR+), and negative likelihood ratio (LR-) for 
different signs and symptoms. Because the number of suit¬ 
able articles was small, additional inclusion criteria such as a 
minimum sample size or publication after a certain year 
were not used. However, the quality of articles included was 
assessed according to criteria previously developed for this 
series. 31 

The likelihood ratios (LRs) for different diagnostic features 
were calculated when not available in the original articles. 
Corresponding 95% confidence intervals (CIs) were deter¬ 
mined by the method of Greenland and Robins. 32 All values 
were rounded to 2 significant digits. When identical or simi¬ 
lar diagnostic features appeared in more than 1 article and 
the patients were similar across studies in terms of demo¬ 
graphics and illness characteristics, weighted summary LRs 
(pooled LRs) and the corresponding 95% CIs were calculated 
with the DerSimonian-Laird random-effects method. 33 We 
used Meta Win statistical software (version 2; Sinauer Associ¬ 
ates, Sunderland, Massachusetts). 

RESULTS 

Quality of the Evidence 

A total of 185 articles were reviewed. All authors agreed 
about which articles met our selection criteria. We chose 6 
articles. Two articles 34,35 included independent blind com¬ 
parisons of symptoms and signs of a small number of 


patients who had been diagnosed as having PD or MSA 
according to comparison of clinical records to pathologic 
results at autopsy. Because the patients studied had 
already been diagnosed clinically as having PD or MSA, 
selection bias was a serious problem. These 2 articles pro¬ 
vided level 3 evidence, leading to grade C recommenda¬ 
tions (see Table 1-7 for summary of Evidence Grades and 
Levels). 

The remaining 4 articles 36-39 had numerous method- 
ologic biases. Although of lower methodologic quality, 
they can still be classified as containing level 3 evidence 
and providing grade C recommendations (Table 38-1). 


Table 38-2 Symptoms Evaluated in Patients With Possible 

Parkinson Disease 

Symptom 

LR+ (95% Cl) 

LR- (95% Cl) 

Tremor 

Arms or legs shake 

Duarte et al 37 

1.4 (1.2-1.6) 

0.25 (0.08-0.78) 

Pearce et al 36 

17(6.3-44) 

0.24 (0.13-0.44) 

Tremor as initial symptom 34 

1.3 (0.90-2.0) 

0.60(0.34-1.1) 

Tremor of head or limbs 39 

11 (4.8-24) 

0.26 (0.12-0.55) 

Rigidity 

Muscle stiffness 38 

2.3 (1.3-4.3) 

0.73 (0.54-0.97) 

Paralysis or weakness 39 

1.3(0.60-2.8) 

0.93 (0.75-1.2) 

Rigidity and bradykinesia 39 

4.5 (2.9-7.1) 

0.12(0.03-0.45) 

Facies and General Symptoms or Historical Findings 

Face less expressive 37 

2.1 (1.4-3.2) 

0.54 (0.35-0.84) 

Feet freeze 37 

3.7 (2.1-6.7) 

0.55 (0.39-0.79) 

Impaired consciousness 39 

0.31 (0.08-11) 

1.3 (1.1-1.6) 

Bradykinesia 

Difficulty rising from chair 

Duarte et al 37 

1.9 (1.3-2.7) 

0.58 (0.38-0.90) 

Mutch et al 38 

5.2 (2.9-9.5) 

0.39 (0.25-0.63) 

Posture and Motor Tasks 

Loss of balance 



Duarte et al 37 

1.6 (1.3-2.2) 

0.29 (0.13-0.68) 

Mutch et al 38 

6.6(3.4-13) 

0.35 (0.21-0.57) 

Shuffling gait 

Duarte et al 37 

3.3 (2.1-5.0) 

0.32 (0.18-0.58) 

Mutch et al 38 

15(4.7-47) 

0.50 (0.36-0.71) 

Trouble turning in bed 38 

13(4.1-43) 

0.56 (0.41-0.76) 

Trouble buttoning 37 

3.0 (2.0-4.4) 

0.33 (0.19-0.60) 

Trouble opening jars 38 

6.1 (3.4-11) 

0.26 (0.14-0.48) 

Uncontrolled limbs 39 

1.3 (0.53-3.1) 

0.93(0.72-1.2) 

Fine Motor 

Micrographia 

Duarte et al 37 

2.8 (1.8-4.2) 

0.44 (0.27-0.71) 

Mutch et al 38 

5.9 (3.1-9.4) 

0.30 (0.17-0.53) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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Table 38-3 Signs Evaluated in Patients With Possible 

Parkinson Disease 

Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Tremor 

Tremor 34 

1.5(1.0-2.3) 

0.47 (0.27-0.84) 

Tremor with rigidity and 
bradykinesia 34 

2.2(1.2-4.2) 

0.50 (0.34-0.75) 

Rigidity 

Rigidity 39 

2.8(1.8-4.4) 

0.38 (0.19-0.76) 

Rigidity with bradykinesia 39 

4.5 (2.9-7.1) 

0.12(0.03-0.45) 

General Findings 

Glabella tap 36 

4.5 (2.8-7.4) 

0.13(0.03-0.47) 

Voice softer 37 

3.7 (2.4-5.6) 

0.25 (0.13-0.49) 

Change in speech 39 

2.6(1.2-5.3) 

0.73(0.53-1.0) 

Asymmetric disease 34 

1.8(0.98-3.2) 

0.61 (0.41-0.91) 

Levodopa response 34 

1.2(0.87-1.6) 

0.63 (0.31-1.3) 

Bradykinesia 

Akinetic/rigid disease 34 

0,44 (0.25-0.75) 

1.7 (1.1-2.6) 

Posture and Motor Tasks 

Difficulty or inability to walk 
heel to toe 39 

2.9(1.9-4.5) 

0.32 (0.15-0.70) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio. 


Selection bias was a major problem in all 4 articles 
because many of the patients evaluated had either been 
diagnosed as having PD on initial clinical examination or 
had obvious parkinsonian features. In one study, 39 a 
screening instrument was administered to patients with 
an initial diagnosis of PD, peripheral neuropathy, stroke, 
or epilepsy. The instrument included both a self-adminis¬ 
tered questionnaire and a set of physical tasks, perfor¬ 
mance of which was graded subjectively. Neurologists 
confirming the presence of PD were aware of each 
patient’s initial diagnosis and responses to the screening 
instrument. This obviously makes the study prone to 
observer bias. Like most studies with low methodologic 
quality, these 4 articles 36 ' 39 reported optimistic LRs. 

Precision 

Interclinician and intraclinician reliability of symptoms and 
signs was documented only for the glabella tap sign. 36 Preci¬ 
sion could not be quantified in the clinicopathologic studies 
because symptom histories were obtained retrospectively from 
medical records. 

Interclinician reliability in eliciting the glabella tap sign 
was found to be 88% among patients with intracranial dis¬ 
ease and 100% in controls. 36 A K coefficient for interclini¬ 
cian agreement could not be calculated because data about 
how each clinician scored each patient were not included. 
No causes for imprecision in assessing symptoms or signs 
were documented in the selected articles. 


Accuracy 

Several symptoms, collected by patient self-report in a ques¬ 
tionnaire, 38 significantly increase the likelihood of PD when 
present and decrease it when absent. The symptoms are trou¬ 
ble turning in bed, shuffling while walking, micrographia, 
difficulty rising from a chair, loss of balance, and trouble 
opening jars. The diagnostic value of tremor as a symptom 
varied widely among the selected articles, with a range in 
LRs+ of 1.3 to 11 (Table 38-2). 

The lack of tremor as a symptom makes PD less likely 
(range of LRs-, 0.24-0.60). However, the usefulness of the 
lack of tremor as a symptom is limited by verification bias in 
the corresponding studies. Verification bias occurs when 
confirmatory or criterion standards are selectively applied to 
patients, depending on the results of their preliminary 
screening test. 40 The independent value of tremor detected on 
neurological examination has an LR+ of only 1.5 (95% Cl, 
1.0-2.3), while the absence of a tremor detected on examina¬ 
tion about halves the likelihood of PD (LR-, 0.47; 95% Cl, 
0.27-0.84) (Table 38-3). 34 

Rigidity as a symptom has an LR+ range of 1.3 to 4.5 and 
makes PD more likely. The absence of rigidity has a broad 
LR, making it less useful (LR- range, 0.12-0.73) (Table 38-2). 
As a sign detected on neurological examination (Table 38-3), 
rigidity as an independent value is more useful (LR+, 2.8; 
95% Cl, 1.8-4.4; LR-, 0.38; 95% Cl, 0.19-0.76). 39 When both 
rigidity and bradykinesia are present, the LR+ for the combi¬ 
nation of findings improves to 4.5 (95% Cl, 2.9-7.1), while 
the absence of both findings makes PD much less likely, with 
an LR- of 0.12 (95% Cl, 0.03-0.45). 39 

The glabella tap sign is useful, with an LR+ of 4.5 (95% Cl, 
2.8-7.4) and an LR- of 0.13 (95% Cl, 0.03-0.47). 36 Changes 
in voice have an LR range of 2.6 to 3.7, while the lack of a 
voice change makes PD somewhat less likely (LR- range, 
0.25-0.73). 37,39 The results confirm the limited usefulness of 
the response to levodopa because the LR+ is only 1.2 (95% 
Cl, 0.87-1.6), while the absence of a response has an LR- of 
0.63 (95 % Cl, 0.31-1.3). 34 

Physicians sometimes must consider whether patients 
have PD or multiple systems atrophy (MSA), so a series of 
findings have been compared between the 2 disorders. 35 
Patients with rigidity as an initial presenting feature of PD 
are less likely to have PD (LR, 0.53; 95% Cl, 0.35-0.80) and 
more likely to have MSA. The presence of dementia also 
favors PD over MSA (LR+, 3.2; 95% Cl, 1.5-6.8). Not sur¬ 
prisingly, central or autonomic nervous systems findings 
are much less likely with PD (LR range when central or 
autonomic findings present, 0.03-0.31), so their presence 
favors MSA. Bradykinesia and symptoms of depression do 
not help distinguish between the disease (95% Cl for the LR 
includes 1). 

THE BOTTOM LINE 

Few studies address the clinical diagnosis of PD rigorously. 
Nearly 200 years after it was first described, the accurate clin- 
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ical diagnosis of PD remains a significant challenge. There is 
a great need for diagnostic studies involving larger numbers 
of patients in which presenting symptoms and signs are pro¬ 
spectively compared with the final diagnosis, established 
through a suitable criterion standard, such as autopsy or 
serial neurologic evaluation. 

A number of classic features of PD, when present, do help 
establish the diagnosis. These include the symptoms of 
tremor, the combination of rigidity and bradykinesia, loss of 
balance, micrographia, and shuffling gait. Difficulty with the 
tasks of turning in bed, opening jars, and rising from a chair 
should also raise the suspicion of PD. It is difficult to gauge 
the usefulness of the absence of tremor as a symptom in rul¬ 
ing out PD because of verification bias in the studies in which 
it was evaluated. 

The diagnostic value of the classic combination of tremor, 
rigidity, and bradykinesia on examination is modest at best. 
Useful signs include the glabella tap, difficulty walking heel 
to toe, and the presence of rigidity on examination. 


CLINICAL SCENARIO—RESOLUTION 


This patient presents with many common features of PD. 
You can question him about the tasks of turning in bed 
and opening jars. A sample of his writing may reveal 
micrographia. A glabella tap test should be performed. 
Additional positive symptoms or signs would justify 
empiric treatment with dopaminergic medication, with 
careful follow-up by a physician experienced in the treat¬ 
ment of this condition. 
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UPDATE: Parkinsonism 



Prepared by Goutham Rao, MD 
Reviewed by Richard Bedlack, MD 


CLINICAL SCENARIO 


A 70-year-old man presents with a 3-month history of 
progressive tremor and difficulty in writing letters. His 
wife states that he has trouble eating with a spoon. Physi¬ 
cal examination reveals tremor in both hands. Otherwise, 
he feels well and continues to play golf. 

UPDATED SUMMARY ON PARKINSON DISEASE 

Original Review 

Rao G, Fisch L, Srinivasan S, et al. Does this patient have Par¬ 
kinson disease? JAMA. 2003;289(3):347-353. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the 
subject headings “exp tremor,” “exp Parkinson disease,” and 
“essential tremor,” published in English from 2002 to July 
2004. No new original articles address the sensitivity and 
specificity of clinical findings for Parkinson disease (PD). We 
found 1 new systematic review that addresses key questions 
related to the diagnosis and management of PD. Much of the 
recent research in the diagnosis of PD has focused on func¬ 
tional neuroimaging with positron emission tomography and 
single-photon-emission computed tomography. 


CHANGES IN THE REFERENCE STANDARD 

There are no laboratory or radiographic reference standards 
for PD. The best reference standard is serial clinical assess¬ 
ments performed by a specialist. Newer imaging methods 
may eventually be routinely helpful in identifying PD at an 
early stage or in patients with atypical signs and symptoms 
and in following the progression of disease. However, they 
have not yet been evaluated with enough methodologic rigor 
to allow clinicians to be confident in the results. 


NEW FINDINGS 

One systematic review confirms that PD remains a clinical 
diagnosis and that neuroimaging, the levodopa challenge, 
and other diagnostic tests are not useful. 1 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

Tables 38-2 and 38-3 were revised because they originally 
included data for helping to distinguish PD from multiple 
systems atrophy. These tables now show only the likelihood 
ratios for making the diagnosis of PD. 

RESULTS OF LITERATURE REVIEW 

No data suggest new useful findings for PD. Symptoms com¬ 
mon in PD include tremor, rigidity, bradykinesia, and micro- 
graphia (unusually small handwriting). Difficulty with tasks 
such as turning in bed, opening jars, or rising from a chair are 
also common. The most useful clinical findings are the gla¬ 
bella tap (see Figure 38-1) and heel-to-toe tests. 

EVIDENCE FROM GUIDELINES 

No guidelines explicitly address the diagnosis of PD. 


CLINICAL SCENARIO—RESOLUTION 


The patient should be asked about key symptoms of PD, 
including loss of balance (positive likelihood ratio [LR+], 
1.6-6.6), shuffling gait (LR+, 3.3-15), and rigidity or stiff¬ 
ness (LR+, 1.3-4.5). It is not clear why he has trouble eat¬ 
ing with a spoon. This could be a result of his tremor or 
because of slowness of movements (bradykinesia). The 
patient should be asked about difficulty rising from a 
chair (LR+, 1.9-5.2), a specific manifestation of bradykin¬ 
esia. His difficulty in writing letters could also be attrib¬ 
uted to tremor, bradykinesia, or micrographia. 
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Further evaluation reveals that the patient’s tremor is 
of slow frequency (4-6/s) and occurs at rest, which is 
typical of the tremor of PD. By contrast, classic essential 
tremor is usually postural (occurs while maintaining a 
position against gravity) or kinetic (occurs during vol¬ 
untary movements.) Physiologic tremor is usually pos¬ 
tural and of higher frequency (8-12/s) than the tremor 
of PD. 

The glabella test should be performed (LR+, 4.5), and 
whether the patient is able to walk heel to toe (LR+, 2.9) 


should also be assessed. Asking the patient to rise from a 
chair and checking to see whether his upper limbs are 
rigid during passive movement is also useful in establish¬ 
ing the diagnosis (combination of bradykinesia and rigid¬ 
ity; LR+, 4.5). 

Parkinson disease is a progressive disease, and patients 
present with different combinations of clinical features of 
various severity at different stages. It is therefore, impor¬ 
tant to evaluate the patient periodically before establish¬ 
ing the diagnosis and initiating treatment. 


PARKINSON DISEASE— MAKE THE DIAGNOSIS 


Parkinson disease remains a clinical diagnosis. No accurate 
laboratory or radiologic test is currently available. An acute 
challenge with levodopa, followed by monitoring for 
improvement of symptoms, is not useful diagnostically. 

PRIOR PROBABILITY 

Parkinson disease affects 1% of people older than 65 years 
and 2% of those older than 85 years. 2 The prevalence among 
older patients presenting with general neurologic com¬ 
plaints is almost certainly much higher, but precise figures 
are unavailable. 

POPULATION FOR WHOM PARKINSON DISEASE 
SHOULD BE CONSIDERED 

Adults with a tremor or other symptoms noted in Tables 38-4 
and 38-5 (see Figure 38-2 for assessment of bradykinesia). 


DETECTING THE LIKELIHOOD OF PARKINSON DISEASE 


Table 38-4 Useful Symptoms for Detecting Parkinson Disease 

Symptom (No. of Studies) 

LR+ 2 

LR- a 

Shuffling gait (2) 

3.3-15 

0.32-0.50 

Bradykinesia (difficulty rising from a chair) (2) 

1.9-5.2 

0.39-0.58 

Loss of balance (2) 

1.6-6.6 

0.29-0.35 

Tremor (4) 

1.4-11 

0.24-0.60 

Rigidity (3) 

1.3-4.5 

0.12-0.93 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Values represent the range across the studies. Different definitions of symptoms 
precluded meta-analysis. 


Table 38-5 Useful Signs for Detecting Parkinson Disease 

Sign 2 

LR+ (95% Cl) 

LR- (95% Cl) 

Rigidity and bradykinesia 

4.5 (2.9-7.1) 

0.12(0.03-0.45) 

Glabella tap 

4.5 (2.8-7.4) 

0.13(0.03-0.47) 

Difficulty walking heel to toe 

2.9 (1.9-4.5) 

0.32 (0.15-0.70) 

Rigidity 

2.8(1.8-4.4) 

0.38(0.19-0.76) 

Asymmetry of disease 

1.8 (0.98-3.2) 

0.61 (0.41-0.91) 

Tremor 

1.5(1.0-2.3) 

0.47 (0.27-0.84) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“All likelihood ratios are from single studies. 

REFERENCE STANDARD TESTS 

Serial clinical examinations performed by a specialist. 
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EVIDENCE TO SUPPORT THE UPDATE 


Parkinsonism 



TITLE Diagnosis and Treatment of Parkinson’s Disease: A 
Systematic Review of the Literature. 

AUTHORS Levine CB, Fahrbach KR, Siderowf AD, 
Estok RP, Ludensky VM, Ross SD. 

CITATION Agency for Healthcare Research and Quality. 
Evidence Report/Technology Assessment No. 57. Rockville, MD: 
Agency for Healthcare Research and Quality; 2003. Prepared 
by Metaworks, Inc., under Contract No. 290-97-0016. AHRQ 
Publication No. 03-E040. http://www.ncbi.nlm.nih.gov/books/ 
bv.fcgi?rid=hstatl.chapter. 123680. Accessed June 7,2008. 

QUESTIONS Though they did assess a number of diag¬ 
nostic techniques, the authors began by addressing 2 spe¬ 
cific questions related to the diagnosis of Parkinson disease 
(PD): 

1. What are the results of neuroimaging studies (computed 
tomography [CT], magnetic resonance imaging [MRI], 
positron emission tomography [PET], single photon 
emission computed tomography [SPECT]) or other 
diagnostic tests in determining the diagnosis of PD? 

2. What are the accuracy, sensitivity, and specificity of an 
L-dopa challenge diagnosing PD? 

DESIGN Qualitative systematic review (ie, formal sys¬ 
tematic review without meta-analysis) of studies about 
diagnosis. Meta-analytic techniques were used for phar¬ 
macologic studies about treatment. 

DATA SOURCES MEDLINE, Current Contents, and 
Cochrane Library databases and bibliographies of all pub¬ 
lications included for review and recent review articles. 

STUDY SELECTION AND ASSESSMENT To iden¬ 
tify studies about diagnosis, the sources above were 
searched for articles published between January 1, 1990, 
and December 31, 2000. Inclusion criteria for diagnostic 
studies included (1) adult patients with potential diagno¬ 
sis of PD; (2) addressing of any diagnostic test to establish 
or support a diagnosis of PD; and (3) observational (pro¬ 
spective, retrospective, and cross-sectional) or interven¬ 
tional designs (randomized controlled trials, nonrandomized 
controlled trials, and uncontrolled case series). 


The review excluded letters, case reports, editorials, com¬ 
mentaries, unpublished study reports, animal or in vitro 
studies, studies written in languages other than English, 
studies published before 1990, studies with fewer than 10 
patients, crossover studies, studies in which results for PD 
population could not be separated from results from other 
populations, or studies not pertaining to diagnosis or 
treatment of PD. All eligible studies were rated for both 
quality and level of evidence. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Studies included for review included the following categories 
of diagnostic testing: apomorphine or L-dopa challenge tests, 
autopsy studies, clinical or laboratory tests, color vision test¬ 
ing, MRI, olfactory testing, PD Test Battery (tests of motor 
function, olfaction and depression quantified with a score 
between 0 and 1.0), PET scans, SPECT scans, and other 
scans. The authors found that clinical diagnosis is the refer¬ 
ence diagnostic standard in many studies, but they point out 
that this is problematic because the clinical diagnosis may be 
wrong in up to 25% of cases. Long-term response to L-dopa 
was also used but is not a valid reference standard. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, positive and negative predictive value 
when available. 

MAIN RESULTS 

There is insufficient evidence to determine the diagnostic 
accuracy and therefore to recommend the use of the follow¬ 
ing to diagnose PD: apomorphine and L-dopa challenge tests; 
SPECT, PET, or other scans; olfactory tests; color vision tests; 
or blood and cerebrospinal fluid tests. 

Sensitivities for the PD Test Battery ranged from 69% to 
95%. Specificities ranged from 64% to 95%. Methodologic 
problems with studies of the PD Test Battery, however, limit 
its usefulness. The authors do not recommend the battery for 
diagnosing PD. 
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The sensitivity and specificity of clinical diagnosis of PD 
at the first patient visit ranged from 53% to 90% and 74% 
to 94%, respectively (compared with subsequent neuropa¬ 
thology studies at autopsy). At the last visit, these values 
increased to a sensitivity of 60% to 87% and specificity of 
82% to 97%. The positive predictive value increased from 
34% to 61% at the first visit to 43% to 75% at the last visit. 
The median negative predictive value was more than 95% 
at both visits. The authors conclude that the clinical diag¬ 
nosis of PD is modestly accurate and improves over time 
and that autopsy is the only acceptable reference standard. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 


STRENGTHS This methodologically sound systematic 
review addresses specific diagnostic questions, establishes 
explicit search and inclusion criteria for diagnostic studies, 
and uses widely accepted methods to assess the value of those 
studies. 

LIMITATIONS The last articles included for review were 
published in 2000. 

The review substantiates the role of the clinical examina¬ 
tion for establishing the diagnosis, rather than additional 
tests. The diagnostic value of specific clinical features (eg, 
bradykinesia) was not explicitly evaluated. The role of 
repeated examinations is important for physicians because 
the diagnostic accuracy improves with following the patient’s 
serial symptoms and signs. 

Reviewed by Goutham Rao, MD 
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Is This Patient 

Allergic to Penicillin? 

Alan R. Salkind, MD 
Paul G. Cuddy, PharmD 
John W. Foxworth, PharmD 


CLINICAL SCENARIOS 


CASE 1 An 18-year-old male college student presents 
with group A streptococcal pharyngitis, and you prescribe 
penicillin. 1 The patient informs you that he developed a 
rash after taking about half a penicillin prescription for a 
respiratory tract infection 3 years ago. The rash was bright 
red, was restricted to the extremities and trunk, and 
resolved several days after penicillin was discontinued. 

CASE 2 A 26-year-old pregnant woman has syphilis. She 
recalls an “itchy rash” and trouble breathing after taking 
penicillin 4 years ago; she thinks the rash appeared about 
3 days into the course of penicillin. Penicillin is the rec¬ 
ommended antibiotic for syphilis in pregnancy, even for 
patients with a true penicillin allergy. 2 


WHY IS IT IMPORTANT TO DETERMINE WHETHER 
PATIENTS HAVE TRUE PENICILLIN ALLERGY? 


Penicillin, a (3-lactam antibiotic, and its semisynthetic 
chemical derivatives (such as ampicillin and amoxicillin) 
and other (3-lactam antibiotics (including cephalosporins, 
carbapenems, and monobactams) remain first-line or 
acceptable alternative treatments for many infections. 3 
However, the use of drugs containing penicillin is often 
limited by an unconfirmed or questionable history of peni¬ 
cillin hypersensitivity provided by the patient. Because fear 
of penicillin anaphylaxis is common among clinicians 
encountering a patient with a self-reported history of peni¬ 
cillin allergy, many clinicians overdiagnose penicillin allergy 
in patients who have not had a true allergic reaction to pen¬ 
icillin. Some clinicians may simply accept a diagnosis of 
penicillin allergy from a patient without obtaining a 
detailed history of the reaction. 4,5 Some patients, when 
asked, have no firsthand recall of an allergic response to 
penicillin, the patient perhaps having been informed of 
their allergy by a parent. 4,5 For example, patients reporting a 
penicillin allergy have described an “allergic reaction” con¬ 
sisting of fever and yellow spots on the tonsils, which actu¬ 
ally related to the illness they were being treated for, rather 
than penicillin itself. 4 Unless a detailed medical history and 
a critical evaluation of the reaction are sought, such 
patients may incorrectly be labeled as penicillin allergic. In 
fact, 80% to 90% of patients who report a penicillin allergy 
are not truly allergic to the drug when assessed by skin test¬ 
ing. 6 ' 9 Consequently, penicillin is withheld from many 
patients who could safely receive the drug or its derivatives, 
perhaps affecting outcomes. 10 Two studies have shown that 
incorrectly labeling patients as being allergic to penicillin 
was associated with increased health care costs. 11,12 
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METHODS 

We searched MEDLINE for English-language literature dated 
from 1966 to October 2000 by using the following Medical 
Subject Headings and search strategy: (1) “medical history 
taking” or “physical examination” and “penicillin” or “(3- 
lactam hypersensitivity” and (2) “reproducibility of results” 
or “observer variation” and “penicillin” or “(3-lactam hyper¬ 
sensitivity.” A textword search was also performed with “inter- 
observer,” “intraobserver” “accuracy,” “precision,” “reliability,” 
“sensitivity,” “specificity,” “skin testing,” and “penicillin” or “(3- 
lactam hypersensitivity” or “allergy.” The bibliographies of per¬ 
tinent articles were searched to identify additional references. 
Included articles were original studies conducted on ambula¬ 
tory or hospitalized children or adults describing the accuracy 
or precision of skin testing in the diagnosis of an immunoglob¬ 
ulin E (IgE) -mediated penicillin allergy. Excluded studies 
investigated allergy to aminopenicilhns (amoxicillin and ampi- 
cillin) or cephalosporins, did not use both major and minor 
determinants in the skin testing procedure, or did not provide 
an explicit definition of penicillin allergy or of a positive skin 
test result. Data from patients who were reported to have had 
an uninterpretable or equivocal skin test result were not 
included in our analysis. Quality measures were applied, as 
used in The Rational Clinical Examination series (see Table 1-7 
for a summary of Evidence Grades and Levels). 13 Using study 
quality as a measure of the relative weight that a single study 
should receive was not used in our analysis, because other 
authors have highlighted the pitfalls of this practice. 14,15 Of the 
14 studies 16 ' 29 meeting our inclusion criteria, 4 studies 1619 com¬ 
pared the clinical history with the skin test result for penicillin 
allergy among a group of patients with and without a positive 
history of penicillin allergy (Table 39-1). Confidence intervals 
(CIs) for the likelihood ratios (LRs) from individual studies 
were computed with a previously described method. 30 

Classification of Penicillin Hypersensitivity Reactions 

The frequency of all adverse reactions to penicillin in the general 
population ranges from 0.7% to 10%. 31 This wide variation in 
the frequency of adverse reactions to penicillin exists because of 


a number of variables, including exposure history, route of 
administration, duration of treatment, elapsed time between the 
reaction and diagnostic skin testing or reexposure, and nature of 
the initial reaction. Understanding the different classifications of 
penicillin hypersensitivity reactions aids evaluation of each 
patient’s risk for an allergic reaction that would preclude admin¬ 
istration of a drug that contains penicillin. 

Cell and Coombs 32 categorized allergic reactions to penicil¬ 
lins by the type of reaction, immune mechanism, and clinical 
syndrome, whereas Levine 33 classified untoward reactions to 
penicillin by their time of onset (Table 39-2). Classification of 
penicillin allergy has been reviewed by several authors 6,34,35 and 
is summarized briefly below. We refer the reader to the original 
works for a more detailed discussion. 32,33 

Immediate Reactions 

Type I, or immediate, reactions are often associated with the 
systemic manifestations of anaphylaxis, such as diffuse ery¬ 
thema, pruritus, urticaria, angioedema, bronchospasm, 
laryngeal edema, hyperperistalsis, hypotension, or cardiac 
arrhythmias, either alone or in combination (Table 39-2). 
Anaphylactic reactions occur in about 0.004% to 0.015% of 
penicillin courses and are most commonly observed in adults 
between the ages of 20 and 49 years. 31 A history of atopy does 
not generally place an individual at increased risk for a type I 
penicillin reaction. 36 However, atopic patients may have a 
higher frequency of severe anaphylactic reactions. 36 

Type I reactions result when penicillin or its reactive 
metabolites covalently bind to serum proteins and then 
crosslink with preformed penicillin-specific IgE antibodies 
bound to tissue mast cells, circulating basophils, or both. 
When the bound IgE antibodies are crosslinked by allergen, 
mast cells are activated to release their mediators. A patient 
using (3-adrenergic antagonists may be at increased risk of 
death if anaphylaxis occurs. 37 

Some reactions to penicillin occurring from 1 to 72 hours 
after administration may also be IgE mediated. These reac¬ 
tions, termed accelerated reactions, can be manifested by urti¬ 
caria, angioedema, laryngeal edema, and wheezing . However, 
urticaria and angioedema can occur at any time after adminis- 


Table 39-1 Studies Assessing the Skin Test for Penicillin Allergy Among Patients With and Without a History of Penicillin Allergy 3 


Source, y 

Quality of 
Methods 6 

Setting (Sample Size, % Penicillin Allergic) 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Adkinson etal, 16 1971 

C 

Inpatient, nonconsecutive (n = 218,11.9) 

0.61 

0.74 

2.4 (1.6-3.5) 

0.5 (0.3-0.85) 

Green etal, 17 1977 

C 

Multicenter study (n = 2947,8.1) 

0.79 

0.45 

1.4 (1.4-1.5) 

0.5 (0.39-0.57) 

Sogn et al, 18 1992 

C 

Multicenter study, chronically ill (n = 1298,12.6) 

0.85 

0.50 

17(1.6-1.9) 

0.3(0.21-0.44) 

Gadde et al, 19 1993 

C 

Sexually transmitted disease clinic (n = 5063,2.5) 

0.43 

0.85 

2.9 (2.4-37) 

0.7 (0.57-0.77) 

Summary 





1.9(1.5-2.5) 

0.5 (0.4-0.6) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 


“An LR+ indicates the likelihood that a patient with a history of penicillin allergy will have a positive penicillin skin test result; an LR- indicates the likelihood that a patient without 
a history of penicillin allergy will have a positive penicillin skin test result. 

“Quality of methods was based on published criteria. Grade C: independent, blind comparison of sign or symptom, with a gold standard of diagnosis among nonconsecutive 
patients suspected of having the target condition plus, perhaps, individuals without the target condition; or nonindependent comparison of sign or symptom with a standard of 
uncertain validity. 13 Of the included studies, not all patients received penicillin challenge. (See Table 1 -7 for a summary of Evidence Grades and Levels.) 
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Table 39-2 Classification of Penicillin Reactions 




Classification 

Time of Onset, h 

Mediator(s) 

Clinical Signs 

Skin Testing 
Useful 

Comments 

Immediate (type 1 
reaction) 

<1 

Penicillin-specific 
IgE antibodies 

Anaphylaxis or hypotension, 
laryngeal edema, wheezing, 
angioedema, urticaria 

Yes 

Much more likely with parenteral administration than 
oral administration; fatal outcome in 1 per 50000 to 

1 per 100000 treatment courses; some reactions 
occurring between 1 and 72 h of exposure may be 
IgE mediated (see text for details) 

Late reactions 

>72 h after 
exposure 





Type II 


IgG, complement 

Increased clearance of red 
blood cells, platelets by lym- 
phoreticular system 

No 

IgE not involved 

Type III 


IgG, IgM immune 
complexes 

Serum sickness, tissue 
injury 

No 

Tissue lodging of immune complexes; drug fever 

Type IV 



Contact dermatitis 

No 


Other (idiopathic) 

Usually > 72 h 
after exposure 


Maculopapular or morbilli¬ 
form rashes 

No 

1 % To 4% of all patients receiving penicillin 


Abbreviations: IgE, immunoglobulin E; IgG, immunoglobulin G; IgM, immunoglobulin M. 


tration of penicillin. Life-threatening reactions occurring 
beyond 1 hour of penicillin administration are rare. The 
patient described in case 1 had none of the features of a serious 
IgE-mediated penicillin allergy. In contrast, the patient 
described in case 2 had features that suggest an IgE-mediated 
accelerated reaction. 

Late Reactions 

Late penicillin hypersensitivity reactions are those that occur 
after 72 hours of drug administration. These responses have 
been classified as types II, III, or IV, depending on the immune 
mechanism underlying the response (Table 39-2). Because 
none of these reactions are IgE dependent, skin testing has no 
role in the evaluation of a patient with type II, III, IV, or idio¬ 
pathic responses to penicillin. 

Some reactions to penicillin are not included in the Gell and 
Coombs 32 classification and have been termed idiopathic. 
Although various immune-mediated responses have been pos¬ 
tulated, the exact immunologic mechanisms underlying these 
responses are not known. The most common idiopathic reac¬ 
tion to drugs containing penicillin is a maculopapular or mor¬ 
billiform rash. The combined frequency of all rashes occurring 
in patients taking penicillin is estimated at 1% to 4%. 38 - 39 These 
eruptions are usually symmetric, often confluent erythematous 
macules and papules that generally spare the palm and soles. 
They may originate on the extremities of ambulatory patients or 
overlie pressure areas of bedridden patients. 9 Rashes associated 
with ampicillin administration occur in 5.2% to 9.5% of treat¬ 
ment courses. 38 ' 40 Patients with Epstein-Barr virus or cytomeg¬ 
alovirus infections, or with acute or chronic lymphocytic 
leukemia, are reported to have a higher incidence of ampicillin- 
associated rash. 6 The reason for the increased incidence of rash 
caused by ampicillin remains unknown. 

In experimental settings, individuals with histories of type I 
hypersensitivity reactions to aminopenicillins (ampicillin, 
amoxicillin, bacampicillin) demonstrate cross-reactivity to 


penicillin when assessed by skin testing. 41 Although some of 
these individuals fail to react to penicillin skin testing and react 
only to skin testing with aminopenicillins, these occurrences 
appear less commonly, yet are well documented. 42 - 43 In con¬ 
trast, individuals reporting a history of a nonimmediate reac¬ 
tion are less likely to react to penicillin skin test determinants. 42 
In light of the above, it is prudent to perform a skin test for 
penicillin in those individuals with a history of an urticarial 
reaction to aminopenicillin derivatives and administer a drug 
containing penicillin only in patients with negative skin test 
results. 44 Patients without urticarial rashes to aminopenicillins 
are unlikely to manifest a serious reaction and can generally 
receive a drug containing penicillin, without further testing. 44 

Drug-independent rashes are common in patients with viral 
infections, especially those caused by the human immunodefi¬ 
ciency virus, hepatitis B, mumps, echovirus, 11 and Coxsackie 
virus. 45 Infections with numerous bacteria can also be associated 
with a rash. 45 Therefore, patients with some infections who 
develop a rash while taking penicillin derivatives or penicillin 
itself should not be automatically labeled as penicillin allergic. 
Moreover, many patients taking penicillin may also be taking 
other medications, including other antibiotics, that can cause 
rashes that are independent of (3-lactam compounds. 9 Maculo¬ 
papular eruptions caused by drugs containing penicillin may 
subside spontaneously despite continued use of the drug and 
may not recur on reexposure. 9 - 40 The frequency of a penicillin- 
associated maculopapular eruption on reexposure to the drug is 
not known because many clinicians withhold drugs that con¬ 
tain penicillin in this patient population. Green et al 17 reported 
that 3 (3.5%) of 85 patients with a maculopapular rash associ¬ 
ated with penicillin administration had adverse reactions to oral 
challenge with penicillin. The nature of the oral challenge reac¬ 
tion was not specified, but none were classified as type I reac¬ 
tions. Six (4.5%) of 134 patients with negative penicillin skin 
test results and a history of a penicillin-associated cutaneous 
reaction had an adverse response to penicillin readministration. 
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The nature of the response was not described. 19 Another 3 
patients with negative penicillin skin test results and a history of 
rash caused by penicillin developed a type I reaction to penicillin 
administration, 19 likely indicating the inaccuracy of the histori¬ 
cal information. If a detailed history of a patient’s reaction to 
penicillin indicates that the rash was stricdy maculopapular, 
with no signs of a type I reaction, then it appears to be safe to 
readminister an antibiotic that contains penicillin. 20 ’ 35 

Penicillin (or any medication) that is clearly associated 
with the development of exfoliative dermatitis or the 
Stevens-Johnson syndrome should be discontinued immedi¬ 
ately and not readministered to the patient. 9 Patients with a 
history of Stevens-Johnson syndrome or exfoliative dermati¬ 
tis attributable to (3-lactam drugs should not undergo a skin 
test 9 and should wear a Medic Alert bracelet indicating a 
severe reaction to the drug. 

Cross-Reactivity With Other (3-Lactam Antibiotics 

Cephalosporins (like penicillins) contain a (3-lactam ring. 3 The 
frequency of allergic reactions within 24 hours of cepha¬ 
losporin administration to patients with a history of penicillin 
allergy and positive skin test results was 5.6% vs 1.7% for 
patients with a history of penicillin allergy and negative skin 
test results. 35 Earlier reports suggested that the cross-reaction 
rate may be higher for first-generation cephalosporins than for 
subsequent cephalosporins. 46 Complicating interpretation of 
these data was the finding that some early first-generation 
cephalosporins contained trace amounts of penicillin. 46 

One group of investigators challenged 19 patients with 
well-documented histories of a type I allergy to penicillin with 
cephalosporins containing side-chain structures expected to 
lead to cross-reaction. 47 Seventeen patients tolerated the chal¬ 
lenge doses and subsequent courses of the cephalosporin. 
Both of the patients who had allergic reactions had positive 
penicillin skin test results to benzylpenicillin only; however, 
another patient with the same skin test pattern tolerated 
cephalosporin challenge without incident. Because this study 
did not contain a control group without penicillin allergy, the 
relative significance of the penicillin allergy cannot be deter¬ 
mined. 47 In another study, 1 (1.6%) of 62 patients with posi¬ 
tive skin test results to penicillin who were challenged with a 
cephalosporin on the same day as the skin testing developed 
mild urticaria plus bronchospasm within 24 hours. 7 Solley et 
al 22 described 27 patients with positive penicillin skin test 
results, all of whom were treated with cephalosporins without 
a reaction, whereas 2 (1.5%) of 151 patients with a positive 
history of penicillin allergy and negative penicillin skin test 
results had an allergic reaction to cephalosporins. Forty-three 
treatment courses with cephalosporins were administered to 
children who had positive skin test results or positive oral 
challenge to penicillin. Forty-one (95%) of the cephalosporin 
courses were well tolerated. Two children experienced a mild 
IgE type-mediated reaction. 26 

In summary, neither the history nor the penicillin skin test 
result reliably predicts the probability of allergic reactions to 
cephalosporins in patients with positive histories of penicillin 
allergy. Available data suggest that the majority of patients who 


Box 39-1 Taking a History of Penicillin Allergy: What to Ask 

• What was the patient’s age at the time of the reaction? 

• Does the patient recall the reaction? If not, who 
informed the patient of it? 

• How long after beginning penicillin did the reaction 
begin? 

• What were the characteristics of the reaction? 

• What was the route of administration? 

• Why was the patient taking penicillin? 

• What other medications was the patient taking? Why 
and when were they prescribed? 

• What happened when the penicillin was discontinued? 

• Has the patient taken antibiotics similar to penicillin 
(eg, amoxicillin, ampicillin, cephalosporins) before or 
after the reaction? If yes, what was the result? 


are allergic to penicillin tolerate cephalosporins without signifi¬ 
cant reaction. Our approach to a patient with a history of peni¬ 
cillin allergy requiring a cephalosporin is to first determine the 
likelihood that the patient requiring a cephalosporin had a type 
I allergic reaction to penicillin (Box 39-1). If a detailed medical 
history does not suggest a true penicillin allergy, we administer 
the cephalosporin. When the history is concerning for penicillin 
allergy, we recommend penicillin skin testing. For patients with 
negative skin test results, the cephalosporin can be administered. 
When the penicillin skin test result is positive and an alternate 
drug cannot be used, cephalosporin desensitization by an expe¬ 
rienced practitioner should be considered. 44 

Some investigators have called for broader use of cepha¬ 
losporin skin testing in patients who are allergic to penicillin 
and require a cephalosporin. 26,47 However, protocols for skin 
testing with cephalosporin compounds are not well stan¬ 
dardized, and the negative predictive value of cephalosporin 
skin testing is not known. 7,44,46 

Carbapenems and monobactams are (3-lactam antibiotics, 
of which imipenem and aztreonam are respective prototypes. 
Patients who have positive skin test results to penicillin have 
also shown a high degree of reactivity to imipenem determi¬ 
nants. 7 Therefore, carbapenems should not be administered 
to patients with positive penicillin skin test results or a con¬ 
cerning history of a type I allergic response to penicillin. 7 
Available information indicates that aztreonam may be safely 
administered to most, if not all, patients with a type I allergic 
response to penicillin. 7 

RESULTS 

WHY IS TAKING A DETAILED CLINICAL HISTORY 
FOR PENICILLIN ALLERGY IMPORTANT? 

The majority of patients with a history of penicillin allergy 
have no concurrent physical examination findings related to 
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the adverse response to penicillin. Thus, initial determina¬ 
tion of the probability of a true penicillin allergy relies almost 
solely on a detailed medical history (Box 39-1). For example, 
a patient receiving penicillin who developed a rash on day 5 
of treatment for an upper respiratory tract infection who has 
since taken multiple courses of drugs containing penicillin 
without an untoward reaction does not have a true penicillin 
allergy. In contrast, if a patient described new-onset wheez¬ 
ing 1 hour after a penicillin injection, it is highly probable 
that this patient had an immediate hypersensitivity reaction 
to the drug. 

When assessing a patient for penicillin allergy, all medica¬ 
tions that the patient is (or was) taking should be evaluated 
for their propensity to cause a reaction similar to the one 
being attributed to penicillin. For example, a patient receiv¬ 
ing penicillin for 4 days without untoward effects who then 
begins taking an angiotensin-converting enzyme inhibitor 
and develops angioedema on the third day of administration 
(day 7 of penicillin therapy) should not be automatically 
labeled as penicillin allergic. 9 

Serious allergic and fatal reactions to antibiotics that con¬ 
tain penicillin can occur in individuals who have never had a 
previous allergic reaction to penicillin or who deny any med¬ 
ical exposure to drugs that contain penicillin. 6 The clinical 
history, no matter how carefully considered, cannot prevent 
these rare reactions. 


ACCURACY OF THE CLINICAL HISTORY 
FOR PENICILLIN ALLERGY 

Four studies 1619 compared the clinical history of penicillin 
allergy to the skin test result and included patients who had 
positive histories of penicillin allergy and those who did not. 
We pooled the results of these studies (Table 39-1). The pres¬ 
ence of a clinical history suggesting penicillin allergy 
increases the likelihood that the patient will be allergic to 
penicillin as assessed by skin testing (summary positive LR, 
1.9; 95% Cl, 1.5-2.5). The absence of a clinical history sug¬ 
gesting penicillin allergy decreases the likelihood of a positive 
skin test result by slightly more than half (summary negative 
LR, 0.5; 95% Cl, 0.4-0.6). 

The percentages of positive skin test results for patients 
with a history of anaphylaxis, urticaria, or a maculopapular 
rash ranged from 17% to 46%, 12% to 16%, and 4% to 7%, 
respectively, in 2 studies. 17,19 One study 17 also reported that 
18% of patients with a history of angioedema had a positive 
penicillin skin test result. Limited data are available about the 
rate of skin test reactivity when the patient’s allergic status to 
penicillin is unknown. Sogn et al 18 found that the proportion 
of positive skin test results among patients with an unknown 
history of penicillin allergy was 3% (3/96). In another study 
of 57 patients with an uncertain allergy to penicillin, 1.7% 
had a positive skin test reaction. 19 Although the clinical his¬ 
tory does help separate those more likely from those less 
likely to have a penicillin allergy, as demonstrated by skin 
testing, the history is not precise. The studies 1619 evaluating the 
skin test in patients with and without a history of penicillin 


allergy had higher positive predictive values for the clinical 
history than all but 1 of the studies that included only 
patients with positive histories of penicillin allergy (sum¬ 
mary positive predictive value, 19%; 95% Cl, 18%-21%). 
After exclusion of the outlier study, 21 the positive predictive 
value for the clinical history of penicillin allergy is 14% (95% 
Cl, 12%-18%). Thus, a clinician would need to perform skin 
tests on 7 patients with a history suggesting penicillin allergy 
to find 1 positive reaction. 

PENICILLIN SKIN TESTING 

Blackley introduced the skin test in 1865 when he scarified a 
portion of his forearm, sprinkled it with pollen, and noted 
the development of itching and swelling surrounded by ery¬ 
thema. It is now known that IgE antibodies mediate such 
reactions. 48 

The penicillin skin test has no place in the treatment of 
patients without a clinical history of a type I penicillin 
allergy. It would also be unnecessary in the face of a bona fide 
history of a life-threatening type I reaction, when equally 
efficacious antibiotics are available, or if the clinician would 
still withhold penicillin therapy regardless of skin test results. 
Some, 11,20,26 but not all, 6,7 investigators have suggested elective 
skin testing for penicillin allergy. Elective skin testing for 
penicillin allergy may be useful in children because of the fre¬ 
quent outpatient need for antibiotics that contain penicillin. 
In addition, elective skin testing of adults with positive histo¬ 
ries of penicillin allergy might be considered in certain situa¬ 
tions. An example of this would be a cancer patient who has a 
positive history of penicillin allergy who is likely to develop 
chemotherapy-induced neutropenia and requires a drug 
containing penicillin promptly for an infection. 44 Recom¬ 
mendations regarding the general use of elective penicillin 
skin testing await further study. 

However, when the history of type I hypersensitivity is 
concerning and penicillin therapy is warranted, skin testing 
is helpful and should be considered. For example, a patient 
who has a positive history of penicillin allergy and has Sta¬ 
phylococcus aureus endocarditis susceptible to an antistaphy- 
lococcal penicillin (such as nafcillin or oxacillin) would be an 
appropriate candidate for skin testing 49 because vancomycin, 
an antibiotic often used in patients who are allergic to peni¬ 
cillin and have serious S aureus infections, is less effective and 
more expensive than nafcillin. 50 

Another factor influencing the decision to perform a skin 
test relates to the ability to do the test in an efficient manner 
with appropriate reagents and interpretation. A recent study 
of hospitalized patients showed that the time for skin testing 
averaged 40 minutes, and the cost for the skin test reagents 
and equipment was $17 per patient. 12 

The positive predictive value of skin testing to assess risk 
for an allergic reaction to penicillin is unclear because 
patients providing a convincing history of a type I reaction 
to penicillin who subsequently react to skin testing are 
unlikely to undergo oral penicillin challenge. However, a 
limited number of patients with positive skin test results 
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have been treated with penicillin. The risk of a type I aller¬ 
gic reaction ranges from about 9% in subjects with negative 
histories of penicillin allergy to 50% to 70% in subjects 
with positive histories. 6 Despite the observation that some 
patients with positive skin test results are able to tolerate 
penicillin, it is inadvisable to administer penicillin to these 
patients because of an unfavorable risk-benefit ratio. 
Patients with positive skin test results who need penicillin 
should undergo desensitization. 6 

Many studies have used penicillin challenge in subjects 
with positive histories of penicillin allergy and negative skin 
test results, and the experiences have been consistent: the 
majority of subjects tolerated the challenge, and those who 
did not experienced only urticaria or other mild cutaneous 
reaction. When 6739 patients with positive histories of peni¬ 
cillin allergy and negative skin test results were given penicil¬ 
lin, only 101 (1.5%) developed an IgE-mediated reaction, 
whereas 43 (0.63%) developed a delayed reaction. 16 ' 29 Penicil¬ 
lin anaphylaxis was not reported in subjects with negative 
skin test results who received a penicillin challenge. Patients 
with positive histories of penicillin allergy who have negative 
skin test results may receive a medically supervised oral peni¬ 
cillin challenge. If there is no reaction to the oral challenge, 
patients can then generally be treated with an oral or paren¬ 
teral penicillin. When the skin test is properly performed, 
almost all patients with negative penicillin skin test results 
can safely receive the drug. Thus, even when the history of a 
type I reaction is concerning and penicillin is the clear drug 
of choice, skin testing should be considered because the 
majority of those patients will have a negative skin test result, 
and 98% of patients with a negative result will tolerate peni¬ 
cillin without any serious sequelae. 6 - 7 

If skin testing seems appropriate after obtaining a detailed 
history of the patients reaction to penicillin, both the major 
determinant (benzyl penicilloyl; commercially available as 
PrePen; Kremers-Urban, Milwaukee, Wisconsin) and the 
minor determinant composed of freshly diluted aqueous 
penicillin G should be used. 44 A minor determinant mixture 
(MDM) is not commercially available in the United States. 
The use of the major determinant reagent alone would detect 
between 75% and 90% of all potential positive reactions. 
Including fresh penicillin G as the sole MDM reagent 
improves identification of patients who may have reactions 
to the skin test by 5% to 10%. 6 However, the addition of 
other minor determinants to the testing protocol may 
increase identification of patients allergic to penicillin by skin 
testing to about 99%. 16,23 The absence of a commercially 
available MDM solution has hampered the general use of the 
penicillin skin test. The steps for performing a penicillin skin 
test are described in detail elsewhere. 44,51 

Limitations of Skin Testing Compared 
With Other Diagnostic Techniques 

A review identified the essential criteria that any diagnostic 
test must satisfy, but studies evaluating penicillin skin test¬ 
ing fail to meet several of these criteria. 52 An independent, 
blind comparison of a reference standard—oral penicillin 


challenge—has never been uniformly applied to all patients 
who have undergone skin testing. Moreover, few studies have 
actually subjected all subjects with positive histories of peni¬ 
cillin allergy and negative skin test results to oral challenge. 
It is clear that in most studies the skin test results influenced 
the decision to perform the penicillin challenge, thus intro¬ 
ducing a built-in bias. These limitations undermine attempts 
to generate reliable estimates of sensitivity and specificity for 
penicillin skin testing compared with oral penicillin chal¬ 
lenge used as the gold standard. This problem, labeled 
reverse workup bias, can result in biased test estimates 
because it is likely that patients who do not undergo skin 
testing differ in important ways from patients for whom test¬ 
ing is undertaken. 53 

Redelmeier and Sox 53 used expert opinion to estimate the 
probability of severe allergic reactions in 100 patients with a 
convincing penicillin allergy history who were to receive the 
drug without previous skin testing. Respondents estimated 
that 5 to 90 (median, 50) patients would experience a severe 
reaction to penicillin. 53 Accordingly, these authors concluded 
that skin testing for patients with a “very strong” history of 
penicillin allergy is not recommended (ie, estimated 50% 
pretest probability for a severe allergic reaction to penicillin 
based on a history of penicillin allergy). They reasoned that 
clinicians would be unwilling to risk a potential serious reac¬ 
tion in these patients even if they had negative skin test 
results. 53 However, at least 50% of patients with a history of 
an IgE-mediated reaction will have a negative skin test 
result. 17,19 Because patients with negative skin test results tol¬ 
erate penicillin well, patients with histories of a type I reac¬ 
tion should undergo skin testing when they have a strong 
indication for penicillin therapy. At least 50% of these 
patients will be identified as candidates for penicillin therapy. 
Still, if the clinician’s treatment threshold is so high that he or 
she is unwilling to administer penicillin regardless of the 
clinical situation (given a history of a type I reaction), skin 
testing clearly has no value. 


CLINICAL SCENARIOS—RESOLUTIONS 


In case 1, the patient reported a maculopapular rash 
halfway through a course of penicillin. The pretest prob¬ 
ability that this represents a true reaction to penicillin 
would be 10%, using a conservative estimate for the fre¬ 
quency of any adverse reaction to penicillin. 31 After a 
careful medical history is taken from the patient, one 
might conclude that his experience is inconsistent with a 
type I reaction. Using a negative LR of 0.5 for a negative 
history of penicillin allergy, the probability that this 
patient will experience any adverse reaction to penicillin 
can be revised to 5.2%, a percentage that is similar to the 
frequency of any adverse reaction to penicillin in the 
general population. 31 In this patient, skin testing should 
not be performed, and the patient should receive peni¬ 
cillin. Careful history-taking should have increased con¬ 
fidence about the safety of administering penicillin to 
this patient. 
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The patient described in case 2 reported, and a detailed 
history confirmed, an urticarial rash within 72 hours of tak¬ 
ing penicillin. Again, using 10% as the pretest probability of 
any adverse reaction to penicillin, 31 a 17% posttest probabil¬ 
ity that this patient has a true penicillin allergy is arrived at by 
using the positive LR of 1.9. We would perform skin testing 
on this patient because a negative skin test result virtually 
excludes a significant reaction to penicillin, whereas a posi¬ 
tive skin test result in this patient with a strong indication for 
penicillin would mandate desensitization. 6 


COMMENT 

We identified only 4 studies meeting our inclusion criteria 
that used penicillin skin testing in patients with and without 
positive histories of penicillin allergy (Table 39-1). Two of 
these studies provided no data on the frequency of positive 
skin test results in patients according to their previous reac¬ 
tion to penicillin. 16,17 Moreover, none of the studies included 
in our analysis were independent, blind comparisons of signs 
or symptoms of penicillin allergy compared with the gold 
standard, oral penicillin challenge. These methodologic flaws 
have tempered the quality of the published database for this 
common clinical problem, leaving us with a pervasive lack of 
guidelines for determining penicillin allergy. 

Nonetheless, encountering patients with a stated penicillin 
allergy remains an everyday problem for many clinicians, 
and some clinicians simply prescribe an alternate antibiotic 
for these patients. However, some alternative antibiotics are 
more expensive, less effective, or associated with more 
adverse effects than penicillin, and there is the risk of increas¬ 
ing antimicrobial resistance. Other clinicians turn to the lit¬ 
erature, hoping to find a rich evidence-based database to 
help guide their decision-making process. Regrettably, the 
methods of diagnosing true penicillin allergy have been inad¬ 
equately studied, leaving the busy clinician to make the most 
informed decision possible while recognizing the limitations 
in the available data. 

We provide an approach to the patient with a stated penicil¬ 
lin allergy based on a critical analysis of an admittedly limited 
database: by systematically documenting signs and symptoms 
associated with the patient’s adverse reaction to penicillin (Box 
39-1), the clinician should be able to determine with a higher 
degree of certainty whether the patient has a true penicillin 
allergy. Using a more structured approach should allow the 
clinician to assess the likelihood that the patient had a true 
penicillin allergy, thereby allowing a more rational decision¬ 
making process in consideration of penicillin usage, as illus¬ 
trated by the resolution of the clinical scenarios. 

THE BOTTOM LINE 

• Many patients recalling a reaction to penicillin are unsure 
of specific details and, even when evidence supporting true 
penicillin allergy is absent, are nevertheless labeled as peni¬ 
cillin allergic by many clinicians. 


• A detailed history of the patient’s drug reaction can help 
the clinician determine whether or not the patient’s self- 
reported history is compatible with a true penicillin 
allergy, permitting penicillin administration to those 
patients who are unlikely to have true penicillin allergy. 

• Eighty percent to 90% of all patients reporting a penicillin 
allergy have negative penicillin allergy reaction when 
assessed by skin testing, meaning that penicillin is withheld 
from many patients who could safely receive the drug. 

• Patients who develop a rash while taking penicillins 
should not be automatically labeled as penicillin allergic 
without considering other possibilities, such as a rash 
caused by the infection being treated or by other drugs 
the patient is taking. 

• For patients with a concerning history of penicillin allergy 
who have a compelling need for penicillin, skin testing 
should be performed. 

• At least 98% of patients with positive histories of penicillin 
allergy and negative skin test results can tolerate penicillin 
without any sequelae. 
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CLINICAL SCENARIO 


A 12-year-old boy with pharyngitis has a positive rapid 
streptococcal test result for strep throat. You would like 
to treat with penicillin V. He has no personal or family 
history of a penicillin reaction. In the past, he has 
received amoxicillin without an adverse reaction. Does 
the absence of a previous reaction guarantee that he is 
not allergic? 

UPDATED SUMMARY ON PENICILLIN ALLERGY 

Original Review 

Salkind AR, Cuddy PG, Foxworth JW, Simel DL, Rennie D, eds. 
Is this patient allergic to penicillin? JAMA. 2001;285(19):2498- 
2505. 

UPDATED LITERATURE SEARCH 

Our literature search combined the search terms “penicillins,” 
“lactams,” “drug hypersensitivity,” and “skin tests” with “sen¬ 
sitivity and specificity,” “reproducibility of results,” “medical 
history taking,” and “physical examination.” The search was 
limited to English-language publications that were in the 
MEDLINE database from 2000 to July 2004. The search strat¬ 
egy yielded 84 articles that were further limited to 52 articles 
by confirming that they were cross-referenced with “exp epi¬ 
demiologic methods” or “prospective studies” as search 
terms. We found 4 studies that provided the predictive value 
of a history of penicillin allergy for a subsequent positive 
penicillin skin test result or response to penicillin on rechal¬ 
lenge. One of these studies was done on a selected population 
of patients who had experienced an immediate penicillin 
response. 

NEW FINDINGS 

• A history of a reaction to penicillin increases the likelihood 
of a reaction to future doses of penicillin (likelihood ratio 
[LR], 11; 95% confidence interval [Cl], 8.5-14), but most 
patients with a previous reaction will not be allergic and 
will not have future reactions. 


• Patients with a history of penicillin reaction who have a 
negative skin test result may still react to courses of penicil¬ 
lin (about 10%), but the chance of a life-threatening reac¬ 
tion is greatly diminished. 

• Patients with well-documented immediate reaction to peni¬ 
cillin are about 90% likely to react with subsequent courses. 

Details of the Update 

Two types of studies address the role of the patient history in 
predicting penicillin allergy. The question is important 
because only about 15% of patients claiming a penicillin 
allergy prove to have positive skin test results. Prospective 
studies either assess responses to skin testing for penicillin 
allergy or assess responses to subsequent penicillin adminis¬ 
trations after an initial penicillin course. About 10% of adult 
patients with an allergic-like event attributed to penicillin 
prove to have positive skin test results for penicillin allergy. 1 A 
negative skin test reaction suggests that 90% of such patients 
who receive another course of penicillin, according to a med¬ 
ical record review, will use penicillin safely, with no adverse 
outcome. Of those who do experience an allergic response, 
the likelihood of an anaphylactic reaction is low (<0.7%). 

A study of a large practice research database extracted 
patients who had 2 separate administrations of penicillin. 2 The 
authors looked for coded diagnoses for specific allergic reac¬ 
tions within 30 days after a penicillin course. The LR for pre¬ 
dicting a second allergic reaction after the first allergic reaction 
was 11 (95% Cl, 8.5-14). The study has at least 2 important 
weaknesses. First, although a large percentage of patients with 
an initial reaction subsequently received a second course of 
penicillin (48%), the clinicians may have used judgment to 
exclude the patients most likely to experience a second reac¬ 
tion. With the rates of reactions in the patients receiving a sec¬ 
ond course of penicillin, the LR for a positive history of allergy 
is likely greater than 16. Second, neither the individual patients 
nor the individual patient records were reviewed. The study 
relies on the accuracy of the physician’s code for each visit. 

A study of children referred for skin testing with possible 
penicillin allergy addresses the problem of lack of confirmation 
from retrospective studies. 3 Individuals who had a negative 
skin test result received an oral dose of penicillin observed by 
the physician (none had an adverse reaction). The overall true 
rate of allergy was 6.7%. Using the commonly held notion that 
10% of all patients experience a penicillin reaction, we can 
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deduce that the LR for a positive history of reaction in these 
children was 10.6, which is almost identical to the point esti¬ 
mate in the large database study. 

A large study reported the 10-year results of patients (n = 
330) referred for allergy testing in a highly selected popula¬ 
tion of patients with a much higher prevalence of true allergy 
(88%). This population was unique in that all the patients 
had experienced immediate reactions to penicillin. For these 
patients, the predictive value of the history was much greater 
in that 61% had a positive skin test result, 11% had a negative 
skin test result but a positive radioimmunoassay test result, 
and 27% (n = 89) had a negative skin test result. When the 
patients with a negative skin test result were rechallenged 
with penicillin in a controlled setting, 49 (55%) had reac¬ 
tions within 1 hour of administration. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

No changes in the data presented are required. 

CHANGES IN THE REFERENCE STANDARD 

The true reference standard for penicillin allergy is a reac¬ 
tion to an oral penicillin challenge that is observed and well 
documented by a physician. Because a physician may not 
observe most reactions, there may be uncertainty about 
attributing the reaction to penicillin. The response to peni¬ 
cillin skin testing can be used as better clinical evidence for 
penicillin allergy. 


Table 39-3 Predictive Value of a Patient’s History of 
Penicillin Allergy for Either a Positive Skin Test Result or 
Actual Allergic Reaction on Rechallenge 


Finding (No. 
of Studies) 

Predictive Value 
(95% Cl) to Identify 
Patients With a 
Positive Skin Test 
Result, % 

Predictive Value (95% Cl) 
for No Allergic Response 
to Penicillin Administration 
After a Negative Skin 
Test Result, % 

History of penicillin 
allergy (2) 1 - 3 

14(12-16) 


History of penicillin 
allergy with negative skin 
test result in children 32 


>95 

History of penicillin 
allergy with negative skin 
test result in adults 1 b 


90(86-91) 


LR+ 

LR- (95% Cl) 

History of penicillin allergy 
for predicting an allergic 
response (anaphylaxis, 
positive skin test result, or 
response to a second 
course of penicillin) 2 

>11 

0.98 (0.98-0.99) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“All children were given an oral challenge with penicillin after negative skin test results. 
Zero of 69 children had an allergic response. The value is the lower limit of the 95% Cl. 
“According to chart review of second penicillin courses after a negative skin test result. 


RESULTS OF LITERATURE REVIEW 

Univariate Findings for Penicillin Allergy 

Skin test results help identify patients who, despite a history 
of penicillin allergy, have a low probability of a reaction to 
penicillin when rechallenged ( ole 39-: ). 

EVIDENCE FROM GUIDELINES 

The role of routinely taking a history of penicillin allergy for the 
general population is not addressed by any federal recommenda¬ 
tions. However, the Centers for Disease Control and Prevention 
addresses penicillin allergies in its treatment guidelines for sexu¬ 
ally transmitted diseases, primarily because of the central role that 
penicillin plays in the treatment of neurosyphilis, congenital 
syphilis, or syphilis in pregnancy. 4 The guidelines note that 3% to 
10% of US adults have experienced reactions after penicillin and 
that 10% of those remain penicillin allergic. They recommend 
skin testing with the major and minor penicillin determinants for 
patients who are at higher risk of a subsequent reaction. When a 
patient has a negative skin test result, the Centers for Disease 
Control and Prevention recommends penicillin for the treatment 
of the above syphilitic conditions, although some experts recom¬ 
mend desensitization for these patients. The recommendations 
include a protocol for penicillin allergy skin testing. 


CLINICAL SCENARIO—RESOLUTION 


The lack of a previous reaction does not guarantee that the 
child will have no future reaction to penicillin or a penicillin 
derivative, but the risk is low (about 1%). You can confi¬ 
dently prescribe penicillin as the preferred antibiotic. Had 
the child experienced a previous reaction, the clinician would 
have to make a decision. The risk that such a child is truly 
allergic to penicillin is greater, given a previous reaction, even 
though the absolute risk is low. The decision for how to pro¬ 
ceed may depend on your assessment of the previous reac¬ 
tion, the need for penicillin vs alternative antibiotics, and the 
availability of penicillin testing. If the previous reaction was 
well documented or an immediate reaction, erythromycin is 
an inexpensive treatment for streptococcal pharyngitis so 
that urgent skin testing is not necessary. 
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PENICILLIN ALLERGY— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

About 10% of patients will have an adverse reaction to 
penicillin, but most will not be penicillin allergic. Less 
than 1% of all patients will have a true allergy to penicil¬ 
lin, as defined by an anaphylactic reaction, positive skin 
test result, or response to a second dose of penicillin. Phy¬ 
sicians should ascertain the nature of the reaction to help 
decide whether it might have represented a true penicillin 
allergy. An allergy to penicillin may be difficult to ascer¬ 
tain from the patient’s medical history, primarily because 
many penicillin reactions do not represent allergic reac¬ 
tions. The most important finding from a penicillin his¬ 
tory is also the least frequent—patients with severe 
reactions (eg, toxic epidermal necrolysis, life-threatening 
anaphylaxis, hemolytic anemia, liver damage) should not 
be skin tested for penicillin allergy and should not receive 
penicillin. 5 

POPULATION FOR WHOM PENICILLIN 
ALLERGY SHOULD BE CONSIDERED 

• Patients with a previous allergic response to penicillin- 
type antibiotics 

• Patients with a response to medications that cross-react 
with penicillin (eg, cephalosporins, carbapenems) 


DETECTING THE LIKELIHOOD OF PENICILLIN ALLERGY 

The history of a penicillin allergy or a negative skin test result 
affects the probability of a true penicillin allergy (Table 39-4). 


Table 39-4 Likelihood Ratios and Predictive Value for a History of 
Penicillin Allergy in Predicting True Allergy 

Finding (No. of Studies) 

LR+ (95% Cl) LR- (95% Cl) 

History of penicillin allergy for predicting an aller¬ 
gic response (anaphylaxis, positive skin test, or 
response to a second course of penicillin) 

>11 =1 

History of penicillin allergy for predicting 
response to penicillin skin test result (4) 

1.9 (1.5-2.5) 0.5 (0.4-0.6) 

Predictive Value 

Predictive value of a negative skin test result in 
a patient with a history of penicillin allergy (1) 

>95% a 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“Probability of no allergic-like event with second administration of penicillin. 


REFERENCE STANDARD TESTS 

Penicillin allergy is confirmed by a reliable history of an imme¬ 
diate anaphylactic reaction, positive skin test reactivity, or well- 
documented response to a second observed penicillin challenge. 
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TITLE Represcription of Penicillin After Allergic-like 
Events. 

AUTHORS Apter AA, Kinman JL, Bilker WB, et al. 

CITATION / Allergy Clin Immunol. 2004;113(4):764- 
770. 

QUESTION How well does the history of an allergic-like 
event from penicillin predict subsequent responses after 
readministration? 

DESIGN Analysis of a large database. 

SETTING AND PATIENTS United Kingdom General 
Practice Research Database of 687 general practitioner 
practices, representative of England and Wales, and com¬ 
prising 6% of the population. The database contained 
records for 3375162 patients who received at least 1 pre¬ 
scription of penicillin. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The General Practice Research Database was assessed for 
patients who received at least 2 penicillin doses at least 60 days 
apart. The patients were sorted by those who had an allergic-like 
response to the first administration and then to subsequent 
administrations. Allergic reactions were identified by computer¬ 
ized codes within 30 days of the penicillin prescription for ana¬ 
phylaxis, urticaria, angioedema, erythema multiforme, laryngeal 
spasm, dermatitis attributed to a drug, toxic epidermal necroly¬ 
sis, or adverse drug reactions attributed to a medication. 

MAIN OUTCOME MEASURES 

Tables (2 x 2) were created for the documented history of 
penicillin reactions as a predictor for a subsequent reaction. 

MAIN RESULTS 

With the initial penicillin course, 0.18% of patients had an 
allergic-like event. Almost 60% of the patients who received 


Table 39-5 The Presence of a Previous Penicillin Allergy Predicts a 
Future Reaction 

Test LR+ (95% Cl) LR- (95% Cl) 

History of penicillin reaction 11 (8.5-14) 0.98(0.98-0.99) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

at least 1 prescription for penicillin also received a second 
prescription (n = 2017957). According to the history of the 
initial response to penicillin, the likelihood ratio (LR) for 
predicting a second reaction can be derived as shown in 

39-5. 

For patients who had an initial reaction to penicillin, 1.9% 
had a reaction to the second course of penicillin. For patients 
who did not have an allergic-like event to the first prescription, 
0.17% had a reaction to the second prescription. 

The serious reactions to the first penicillin course (n = 3014) 
included anaphylaxis (n = 16), angioedema (n = 106), laryn- 
gospasm (n = 19), and toxic epidermal necrolysis (n = 6). Most 
patients had urticaria (n = 2275) or erythema multiforme (n = 
237). The pattern of reactions to the second penicillin course 
was similar. 

About 75% of the prescriptions were for amoxicillin. 

CONCLUSIONS 

LEVEL OF EVIDENCE Review of a large database with out¬ 
comes of uncertain reliability. 

STRENGTHS Large database that allows the detection of low- 
frequency events. The physicians had to concur with the diag¬ 
nosis, as evidenced by their reporting the diagnostic code. 

LIMITATIONS Lack of standardized case definitions. A 
“case” required a second visit by the patient and appropriate 
coding, both of which would bias the outcomes to underesti¬ 
mate all allergic-like reactions. Although the specificity of the 
diagnosis for an allergic-like event seems reasonable, there is 
an assumption that the event was attributable to the penicil¬ 
lin and not to another drug or to illness. The limitations of 
this study were addressed in an editorial. 1 
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It is surprising that 48% of patients with an initial allergic-like 
event received a second course of penicillin. This could have 
happened because patients forgot their previous reaction and 
the physician was therefore unaware or because the previous 
reaction was attributed to another cause. Few patients (1.89%) 
had a second reaction. When the authors expanded their case 
definition of reactions to include bronchospasm, asthma, and 
eczema, the allergic-like events increased to 9% for patients with 
a previous reaction. The event rate of 9% matches the event rate 
for patients with a history of penicillin allergy who have a nega¬ 
tive skin test result and then are treated again with penicillin. 2 

If the physicians were efficient in identifying the patients 
most likely to have a second reaction, then the positive LR of 
11 is underestimated (Table 39-5). To highlight this, we can 
project the low-event rate of second reactions (1.9%) onto the 
3198 initial reactors who did not receive a second course of a 
penicillin. Had those patients received a second course with an 
allergic-like event, the positive LR for a previous penicillin 
reaction would have been 16. The inclusion of those patients 
creates minimal change in the negative LR (0.95). Given the 
caveats about the data set, it is probably safe to say that the his¬ 
tory of a penicillin reaction documented by a physician confers 
a positive LR greater than 11 for a second reaction. This LR for 
penicillin allergy is much higher than the LR for the clinical 
history in predicting allergy as defined by the response to skin 
testing. 

Reviewed by David L. Simel, MD, MHS 

REFERENCES FOR THE EVIDENCE 

1. Josephson AS. Penicillin allergy: a public health perspective. / Allergy 
Clin Immunol. 2004;113(4):605-606. 

2. Macy E, Mangat R, Burchette R. Penicillin skin testing in advance of 
need: multiyear follow-up in 568 test result-negative subjects exposed to 
oral penicillins. / Allergy Clin Immunol. 2003;111(5):1111-1115. 


TITLE History of Penicillin Allergy and Referral for Skin 
Testing: Evaluation of a Pediatric Penicillin Allergy Testing 
Program. 

AUTHORS Langley JM, Halperin SA, Bortolussi R. 

CITATION Clin Invest Med. 2002;25(5):181-184. 

QUESTION Does a history of penicillin allergy predict 
response to skin testing and oral challenge with penicillin? 

DESIGN Prospective, protocol assessment. 

SETTING Canadian ambulatory infectious disease clinic. 

PATIENTS Seventy -four children referred for possible 
penicillin allergy. Ninety-six percent had generalized cuta¬ 
neous eruptions. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Penicillin allergy was defined by a history of life-threatening 
anaphylaxis, a positive skin-test result, or no response to an 
observed oral challenge of penicillin. 

MAIN OUTCOME MEASURE 

Positive predictive value of the history for penicillin allergy. 

MAIN RESULTS 

Two patients had “convincing” life-threatening anaphylaxis 
and 3 had a positive intradermal skin test result. The remain¬ 
ing 69 patients had an oral challenge with penicillin; none 
had an adverse reaction, so the negative predictive value is 
100% (lower 95% confidence interval [Cl], 96%). The posi¬ 
tive predictive value of the history of penicillin allergy was 
6.7% (95% Cl, 2.9%-15%). 

CONCLUSIONS 

LEVEL OF EVIDENCE Positive predictive value study. 

STRENGTHS Patients with a negative skin test result for 
penicillin allergy were given an oral challenge with penicillin. 

LIMITATIONS Small population of patients with uncertainty 
about whether these were consecutive patients. As with all refer¬ 
ral studies of penicillin allergy, this likely does not capture the 
universe of patients with possible penicillin reactions. Nine per¬ 
cent of the patients were referred for cephalosporin reactions. 

Although the study was small, the information presented is 
enhanced by the oral penicillin challenge in patients who had 
a negative skin test result. Using the generally held notion 
that about 10% of the population will have a reaction to pen¬ 
icillin, the true allergy rate would be 0.067 x 0.10 = 0.67%. If 
we take 0.67% as the prior probability and use 6.7% as the 
posterior probability for a patient with a positive penicillin 
allergy history, we can solve for the likelihood ratio: 

Penicillin allergy likelihood ratio = posterior odds/prior 
odds, or (0.067/0.933)/(0.0067/0.9933) = 10.6. 

Reviewed by David L. Simel, MD, MHS 
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TITLE Penicillin Skin Testing in Advance of Need: Multi¬ 
year Follow-up of 568 Test Result-Negative Subjects 
Exposed to Oral Penicillins. 

AUTHORS Macy E, Mangat R, Burchette RJ. 

CITATION Allerg Clin Immunol. 2003;111 (5): 1111-1115. 

QUESTION Among patients with a history of penicillin 
allergy, does a negative skin test reaction confirm the lack 
of penicillin allergy? 

DESIGN Retrospective medical record review. 

SETTING Allergy clinic as part of a health care manage¬ 
ment organization. 

PATIENTS Patients were adults referred to an allergist 
for skin testing for suspected penicillin allergy. The symp¬ 
toms of their allergic reaction were not described. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The computerized pharmacy records of patients who had a neg¬ 
ative skin test result for penicillin were reviewed. Using the nar¬ 
rative of the patient’s clinical records, the investigators searched 
for documentation of an allergic response to penicillin. 


MAIN OUTCOME MEASURES 

Allergic responses were recorded as anaphylaxis, gastrointes¬ 
tinal reactions, hives, other rashes, or other reactions. 


MAIN RESULTS 

During 7 years, 1383 patients were skin tested for penicillin 
allergy. Among this population of patients with a clinical sus¬ 
picion for penicillin allergy, 137 had positive skin test results 
(9.9%) ( le 39-6). The charts of the remaining 1246 
patients were studied for penicillin exposures; 568 patients 
received subsequent penicillin challenges. Among the 
patients with a history of penicillin allergy, with a negative 
skin test result, and who were challenged with penicillin, 65 
had a reaction. None of the reactions were documented as 
truly anaphylactic (by chart review, upper 95% confidence 
interval < 0.7%), with most being “hives” (72%) or other 
rashes (12%). 


Table 39-6 The Predictive Value of a History of a Penicillin Allergy 

Is Modified by Knowing the Response to Skin Testing 

Test 

Predictive 
Value (95% Cl) 
for Positive 
Skin Test 
Result, % 

Predictive Value 
(95% Cl) for No 
Allergic Response 
to Penicillin 
Administration, % 

Predictive value 
(95% Cl) for 
Positive Skin 
Test Result or 
Subsequent 
Allergic 
Reaction, % 

History of peni¬ 
cillin allergy 

9.9(8.4-12) 


15(13-17) 

History of peni¬ 
cillin allergy with 
negative skin 
test result 


90(86-91) 



Abbreviation: Cl, confidence interval. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Large sample size of patients referred for a 
possible penicillin reaction. 

LIMITATIONS Retrospective chart review, relying on non- 
standardized clinical documentation of reactions. Patients may 
have received care outside of the health care management 
organization, so their results would not have been captured. 

By defining true penicillin allergy as a positive skin test 
response or an allergic reaction to a second course of penicil¬ 
lin, then the positive predictive value of a patient’s history of 
a penicillin allergy is 15%. 

With the skin test result as a reference standard, about 10% 
of patients with a reported reaction to penicillin will have a 
positive reaction. A negative skin test result among these 
patients makes a subsequent anaphylactic reaction unlikely 
(<1%). However, patients with a negative skin test result 
have at least a 10% rate of subsequent reactions (most are 
skin reactions). The retrospective nature of this study design 
probably means that the rate is higher because patients with 
less severe skin reactions might not have sought care and 
some may have received care from other providers. 

Reviewed by David L. Simel, MD, MHS 
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Does This Patient Have Pneumonia? 

A 53-year-old woman comes to your office with a cough of 
more than 1 week’s duration. She was in excellent health 
until 7 days ago, when she developed a nonproductive 
cough, mild sore throat, and myalgia. She recalls no history 
of asthma or chronic obstructive pulmonary disease, and 
she does not smoke. Despite staying home from work for 
the last 2 days, she has noted increasing sputum production 
with her cough and worsening fatigue. She has felt warm 
but has not documented any fever or night sweats. On 
physical examination, her oral temperature is 38.3°C 
(101°F), her heart rate is 110/min, and auscultation of her 
chest reveals inspiratory crackles on the left side. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Physicians commonly encounter patients with respiratory 
complaints similar to those in the clinical scenario. In 1994, 
there were more than 10 million visits to primary care physi¬ 
cians by adults with a chief complaint of cough, representing 
more than 4% of all visits to physicians that year. Pneumonia 
represented only 5% of all causes for these visits and was the 
fifth leading diagnosis, after bronchitis, upper respiratory tract 
infection, asthma, and sinusitis. 1 Though pneumonia may rep¬ 
resent a small proportion of all acute respiratory illnesses, the 
accurate identification of this subgroup is important because of 
the distinct therapeutic and prognostic features of this illness. 

In the preantibiotic era, mortality from pneumococcal 
pneumonia was consistently higher than 20% for all cases, 
increasing to more than 60% for bacteremic cases. 2 Since the 
introduction of antibiotics, no one has reported results from 
large-scale studies comparing antibiotic therapy to nonanti¬ 
biotic therapy for patients with pneumonia. As a result, such 
therapy is universally recommended and has become a stan¬ 
dard of care for all patients with pneumonia. No such stan¬ 
dard exists for alternative respiratory infections such as 
bronchitis 3 or the common cold. 4 Moreover, inappropriate 
use of antibiotics for these alternative respiratory infections 
may be an important determinant of the increase in antibi¬ 
otic resistance among common respiratory pathogens. 5,6 

In terms of prognosis, patients with pneumonia continue 
to have an overall high mortality from this illness, ranging 
from as low as 5% in studies of hospitalized and ambulatory 
patients to as high as 37% in studies of patients requiring 
admission to intensive care units. 7 This persistently high 
mortality underscores the need for physicians to choose care¬ 
fully between home or hospital therapy for all patients with 
pneumonia. 8 For these reasons, physicians need to know how 
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to use their clinical examination optimally to identify 
patients at suitable risk for pneumonia to require further, 
definitive diagnostic testing. 

Chest radiography is the reference standard for diagnosing 
community-acquired pneumonia and provides additional 
information on the prognosis of patients with this illness, 9 as 
well as the presence of coexisting conditions such as bron¬ 
chial obstruction or pleural effusions. 10 Moreover, chest radi¬ 
ography is highly reliable, 11 safe, generally available, and 
relatively inexpensive, so that it is a standard part of the eval¬ 
uation of any patient with suspected pneumonia. It is possi¬ 
ble that some physicians continue to diagnose and treat 
patients with pneumonia without the aid of chest radiogra¬ 
phy, whereas other physicians routinely obtain chest radio¬ 
graphs for all patients suspected of having pneumonia. We 
do not know the proportion of physicians who choose these 
alternative strategies. Therefore, the aims of this article are 
both to assess the validity of the former approach (diagnos¬ 
ing pneumonia without chest radiography, using medical 
history and physical examination alone) and to identify ele¬ 
ments of the clinical examination that might improve the 
efficiency of the latter approach (ordering chest radiographs 
for all patients with suspected pneumonia). 

PATHOPHYSIOLOGY OF 
COMMUNITY-ACQUIRED PNEUMONIA 

In patients with community-acquired pneumonia, the site of 
infection can involve the pulmonary interstitium, alveoli, or 
both. This provides the physiologic basis for the principal 
chest examination findings in pneumonia, which include 
dullness to percussion, changes in the intensity of tactile 
fremitus and breath sounds, and inspiratory crackles. Dull¬ 
ness to percussion and local changes in the intensity of tactile 
fremitus and breath sounds are the result of diffuse replace¬ 
ment of the pulmonary parenchyma with inflammatory tis¬ 
sue, leading to pulmonary consolidation or the presence of 
pleural effusions. 12 In a patient with pneumonia, crackles 
(formerly called “rales”) are caused by the delayed opening of 
alveoli in deflated regions of pathologically inflamed lung. 13 
Crackles refer to any discontinuous adventitious lung sounds 
and can therefore be heard in a variety of pulmonary diseases 
that cause lung stiffening, including congestive heart failure, 
pulmonary fibrosis, and obstructive lung disease. 12 

HOW TO ELICIT THESE SYMPTOMS AND SIGNS 

Patients with community-acquired pneumonia present with a 
large number of possible symptoms. In a study of more than 
1800 patients with community-acquired pneumonia, these pre¬ 
senting symptoms ranged from typical respiratory complaints, 
including productive cough, dyspnea, and pleuritic chest pain, 
to predominantly systemic complaints of fatigue, anorexia, and 
myalgias. Moreover, the pattern of presenting symptoms varied 
considerably among patients, particularly among elderly 
patients with pneumonia, who less frequently reported a wide 
range of symptoms. 14 As a result, careful history-taking in a 


patient suspected of having community-acquired pneumonia 
should consider a broad range of possible symptoms, including 
respiratory and nonrespiratory symptoms. 

In contrast, the examination of the chest in patients with 
suspected pneumonia is traditionally carried out in a struc¬ 
tured manner, proceeding through the 4 steps of inspection, 
palpation, percussion, and auscultation. The chest is inspected 
for signs of asymmetric chest expansion, defined as a visible 
difference in excursion between the 2 sides of the chest. The 
chest wall is palpated while the patient speaks to assess the 
transmission of sound, or tactile fremitus. Percussion over 
symmetric areas of the anterior and posterior chest wall 
detects diminution in the resonance of the percussion note, or 
dullness to percussion. Finally, auscultation of the lung 
assesses the intensity of normal breath sounds, the transmis¬ 
sion of spoken words, and the presence of adventitious sounds. 
Auscultation in the peripheral lung fields may detect the 
replacement of the normal vesicular breath sounds with tubu¬ 
lar or bronchial breath sounds, which are normally heard only 
over the trachea. Increased transmission of speech may be 
detected as the increased clarity of whispered phrases, known 
as whispered pectoriloquy, or as the change in timbre of vowel 
sounds in the form of “e” to “a,” known as egophony. 12 The 
principal abnormal sounds in community-acquired pneumo¬ 
nia are known as crackles, which are nonmusical, discontinu¬ 
ous sounds and should be detected with the patient in the 
upright position. It has been suggested that auscultation of 
each lung in the lateral dependent position is a more sensitive 
technique for crackles, but this has not been independently 
validated. 15 Auscultation should occur with the patient breath¬ 
ing at normal tidal volumes because inspiration from lower 
lung volumes (ie, residual volume) can yield abnormal auscul¬ 
tatory findings in as many as 50% of normal subjects. 16 Finally, 
both percussion and auscultation of the chest should proceed 
in a systematic fashion, with an examination of symmetric 
areas on the anterior and posterior chest wall. 

METHODS 

Literature Search 

We searched English-language medical literature to deter¬ 
mine the precision of the clinical examination in patients 
with community-acquired pneumonia and the accuracy of 
the examination in diagnosing patients suspected of having 
this illness. We searched MEDLINE from 1966 through 
October 1995 according to an initial search strategy similar 
to that used by other authors in this series. The initial 
retrieval of titles (n = 7 for precision; n = 140 for diagnostic 
accuracy) was reviewed by 2 of us (J.P.M. and M.J.F.). Arti¬ 
cles that focused on hospital-acquired pneumonia, pediat¬ 
ric pneumonia, or AIDS-related pneumonia were excluded. 
The remaining articles were retrieved, as well as any poten¬ 
tially eligible articles identified through review of the article 
reference lists (n = 7 for precision; n = 52 for diagnostic 
accuracy). 

A set of explicit inclusion and exclusion criteria was 
applied to each retrieved article. Inclusion criteria required 
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that the study be an original study of the accuracy or preci¬ 
sion of the medical history or physical examination in deter¬ 
mining the diagnosis of community-acquired pneumonia. 
Exclusion criteria consisted of studies of patients younger 
than 16 years, patients with known immunosuppression, or 
patients with nosocomial infections. In addition, case series 
(<10 observations) or review articles without original data 
were excluded. 

Quality Review of Articles 

The remaining eligible articles were each evaluated by one of 
us (J.P.M.) according to a methodologic quality filter that 
assigned a level of evidence from 1 to 5 according to the 
internal validity of the study. Level 1 evidence refers to a pri¬ 
mary, prospective study of the accuracy or precision of the 
clinical examination in community-acquired pneumonia. 
For studies dealing with accuracy, this requires independent, 
blind comparisons of clinical findings with a criterion stan¬ 
dard (or gold standard) of diagnosis or etiology among a 
large number (>50) of consecutive patients suspected of hav¬ 
ing community-acquired pneumonia. For studies dealing 
with precision, this requires 2 or more independent blinded 
raters of symptoms or signs in a large number of patients 
suspected of having community-acquired pneumonia. Level 
2 studies were analogous to level 1 studies but with smaller 
numbers of patients (10-50), widening the confidence limits 
of the resulting calculations. Level 3 studies were based on a 
retrospective design (ie, clinical findings determined by chart 
review). Level 4 studies included nonconsecutive patients, 
generally selected because of their definitive results for the 
findings under study, or a nonblinded comparison of clinical 
findings with a gold standard. Level 5 studies included stud¬ 
ies with an uncertain gold standard or a poorly defined study 
population (ie, may not even have community-acquired 
pneumonia). For the purposes of this study, only studies of 
level 1 quality, also called grade A evidence, were considered 
for the main analyses and tables. Summaries of relevant level 
2 through 5 studies are provided in the text. 

Data Analysis 

Likelihood ratios (LRs) were calculated for the presence (posi¬ 
tive LR [LR+]) or absence (negative LR [LR-]) of individual 
clinical findings. 17 - 18 Only those findings significantly associated 
with the presence or absence of pneumonia in at least 1 study, 
based on a 2-tailed % 2 or Fisher exact test with P less than .05, 
were included in the results. However, the actual diagnostic 
value of statistically significant findings still depends on both 
the prior probability of pneumonia and how much the LR 
moves the posterior probability from the prior probability. 19 

RESULTS 

Precision of Symptoms and Signs of 
Community-Acquired Pneumonia 

Interobserver variation in the recording of the presence of 
symptoms in patients with community-acquired pneumonia 


has not been directly examined. However, analogous work in 
assessing symptom prevalence in large-scale epidemiologic 
studies has revealed considerable interobserver variation in 
the recording of symptoms. 20 - 21 This has led to the adoption 
of standardized respiratory questionnaires in epidemiologic 
studies of chronic respiratory illnesses. However, no such 
standardized questionnaires exist for recording symptoms in 
patients with acute respiratory infections. 22 

It has also been appreciated for some time that the physical 
examination of the chest is hampered by a high degree of 
interobserver error. Although no study has specifically 
addressed the reliability of the physical examination in patients 
with community-acquired pneumonia, Spiteri et al 23 measured 
reliability among 24 physicians in the examination of 24 
patients with a variety of respiratory conditions, 4 of whom 
had radiographic evidence of pneumonia. Table 40-1 presents 
the calculated interobserver reliability among the physicians 
for several chest signs. The results are presented in the form of 
both mean pair observer agreement rates and K values, which 
account for rates of chance agreement ranging from 0, when 
agreement is no better than chance, to 1, when there is perfect 
agreement. In fact, 2 of the most reliable findings, dullness to 
percussion and wheezes on auscultation, had only fair to good 
K values of 0.52 and 0.51, corresponding to agreement rates of 
77% and 79%, respectively. Crackles had a K value of 0.41 
(agreement rate of 72%), and several findings such as whis¬ 
pered pectoriloquy and increased tactile fremitus had K values 
indicating poor agreement (range, 0.01-0.11), in part explained 
by the rarity of these findings overall. 

Similarly poor interobserver reliability has been observed in 
the chest examination of other respiratory diseases. For exam¬ 
ple, Schilling et al 24 observed an agreement rate of 76% for 
abnormal chest sounds in the examination of 187 men with 
interstitial lung disease and 88 controls; this yields a K value of 
0.25. Smyllie et al 25 measured agreement rates among 9 physi¬ 
cians who examined 20 patients with a variety of chronic lung 
diseases. Agreement rates were generally midway between 
chance and perfect agreement for a number of chest examina- 


Table 40-1 Precision of Physical Examination Findings in Examination 
of the Chest 3 

Physical Examination Finding 

Agreement, % b 

k Value 

Tachypnea 

63 

0.25 

Reduced chest movement 

70 

0.38 

Increased tactile fremitus 

85 

0.01 

Dullness to percussion 

77 

0.52 

Decreased breath sounds 

C 

0.43 

Wheezes 

79 

0.51 

Crackles 

72 

0.41 

Bronchial breath sounds 

C 

0.32 

Whispered pectoriloquy 

C 

0.11 


"Adapted from Spiteri et al. 23 

"Calculated according to data provided in Table 1 by Spiteri et al. 23 

'Ellipses indicate mean pair agreement rates were not calculated for the signs for which 2 

or more physicians in a group failed to report the presence or absence of the sign. 
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tion findings, including diminished breath sounds, decreased 
percussion note, and crackles. Though the basis for the relatively 
low interobserver reliability in chest examination is unknown, at 
least 1 group has suggested that deficiencies in the teaching of 
the clinical examination are to blame. 23 

Accuracy of the Clinical History in the Diagnosis 
of Community-Acquired Pneumonia 

For this review, 4 studies were judged to have level 1 evidence 
on the test characteristics of individual items in the clinical 
history in the diagnosis of community-acquired pneumo¬ 
nia. 26 ' 29 In each of these studies, the reference standard for the 
diagnosis of pneumonia was a new infiltrate on a chest radio¬ 
graph. Table 40-2 summarizes the value of findings from the 
medical history, including respiratory symptoms, nonrespi- 
ratory symptoms, and information on medical history. 

Though all 4 studies were based in emergency departments, 
variations in the patterns of the results reflect, in part, variation 
in the selection criteria for each study. For example, in the study 
by Diehr et al, 26 chest radiographs were obtained for all patients 
presenting with acute cough, whereas the other studies obtained 
chest radiographs only when the primary physician previously 
determined a need for them, often to confirm or exclude a sus¬ 
pected diagnosis of pneumonia. The latter approach provides a 
more highly selected population of patients with acute respira¬ 
tory complaints that may alter the measured test characteristics 


of individual clinical findings. This selection bias is reflected in 
the fact that the prevalence of pneumonia in the study popula¬ 
tions ranged from as low as 2.6% 26 to as high as 38%. 27 

Still, certain patterns emerge. For example, there are no indi¬ 
vidual items from the clinical history whose presence or absence 
would reduce the odds of disease sufficiently to exclude pneu¬ 
monia and eliminate the need to obtain a chest radiograph. The 
one exception to this is the presence of a medical history of 
asthma, which reduces the odds of pneumonia with an LR of 
0.1, though this has been demonstrated in only 1 study. 29 

Similarly, the presence of no single item in the clinical his¬ 
tory increases the odds of pneumonia high enough to con¬ 
firm the diagnosis without a chest radiograph. Though the 
presence of findings with an LR+ ranging from 2 (fever or 
immunosuppression) to 3 (history of dementia) may be 
helpful, they are not confirmatory, particularly given the typ¬ 
ically low prevalence of pneumonia in the study populations. 
For example, in the study by Diehr et al, 26 the presence of 
subjective fever (LR+, 2.1; 95% confidence interval [Cl], 1.4- 
2.9) had a positive predictive value of only 5.5%, reflecting 
the low prevalence of pneumonia in the population. 

Accuracy of Physical Examination Findings in the 
Diagnosis of Community-Acquired Pneumonia 

Table 40-3 summarizes the accuracy of 10 potential findings 
(3 vital signs and 7 abnormal findings on chest examination) 


Table 40-2 Likelihood Ratios for Pneumonia, Given the Presence or Absence of Individual Medical History Findings 2 

Source, Year 


LR+ b LR- C 


Findings 8 

Diehr et al, 26 
1984 

Gennis et 
al, 27 1989 

Singal et 
al, 28 1989 

Heckerling et 
al, 29 1990 

Diehr et 
al, 26 1984 

Gennis et 
al, 27 1989 

Singal et 
al, 28 1989 

Heckerling et 
al, 29 1990 

Respiratory symptoms 

Cough 

d 

NS 

1.8 

NS 


NS 

0.31 

NS 

Dyspnea 


1.4 

NS 

NS 


0.67 

NS 

NS 

Sputum production 

1.3 

NS 


NS 

0.55 

NS 


NS 

Nonrespiratory symptoms 

Fever 

2.1 

NS 


1.7 

0.71 

NS 


0.59 

Chills 

1.6 

1.3 


1.7 

0.85 

0.72 


0.70 

Night sweats 

1.7 




0.83 




Myalgias 

1.3 

NS 



0.58 

NS 



Sore throat 

0.78 

NS 



1.6 

NS 



Rhinorrhea 

0.78 

NS 



2.4 

NS 



Medical history 

Asthma 




0.10 




3.8 

Immunosuppression 




2.2 




0.85 

Dementia 




3.4 




0.94 


Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, result not significant. 

“Only those findings significantly associated with the presence or absence of pneumonia in at least 1 study are included ( P< .05 in a 2-tailed % 2 or Fisher exact test). 
6 LR+ for pneumonia when symptom present, sensitivity/(1 - specificity). 

C LR- for pneumonia when symptom absent, (1 - sensitivity)/specificity. 

“Ellipses indicate result is not available. 
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Table 40-3 Likelihood Ratios for 1 

Findings 

Pneumonia, Given the Presence or Absence of Physical Examination Findings 3 

Source, Year 




LR+° 



LR-“ 


Diehr et al, 26 
1984 

Gennis et 
al, 27 1989 

Singal et 
al, 28 1989 

Heckerling et 
al, 29 1990 

Diehr et 
al, 26 1984 

Gennis et 
al, 27 1989 

Singal et 
al, 28 1989 

Heckerling et 
al, 29 1990 

Vital signs 

Respiratory rate, breaths/min 

>20 

d 

1.2 




0.66 



>25 

3.4 


NS e 

1.5 

0.78 


NS 

0.82 

>30 


2.6 




0.80 



Heart rate, beats/min 

>100 

NS 

1.6 

NS e 

2.3 

NS 

0.73 

NS 

0.49 

>120 


1.9 




0.89 



Temperature > 37.8°C (100°F) 

4.4 

1.4 

2.4 

2.4 

0.78 

0.63 

0.68 

0.58 

Any abnormal vital sign 


1.2 




0.18 



Chest examination 

Asymmetric respiration 

oo 




0.96 




Dullness to percussion 

NS 

2.2 


4.3 

NS 

0.93 


0.79 

Decreased breath sounds 


2.3 


2.5 


0.78 


0.64 

Crackles 

2.7 

1.6 

1.7 

2.6 

0.87 

0.83 

0.78 

0.62 

Bronchial breath sounds 




3.5 




0.90 

Rhonchi 

NS 

1.5 


1.4 

NS 

0.85 


0.76 

Egophony 

8.6 

2.0 


5.3 

0.96 

0.96 


0.76 

Any chest finding 


1.3 




0.57 




Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, result not significant. 

“Only those findings that were significantly associated with the presence or absence of pneumonia in at least 1 study are included (P < .05 in a 2-tailed x 2 or Fisher exact test). 
6 LR+ for pneumonia when finding present, sensitivity/(1 - specificity). 

C LR- for pneumonia when finding absent, (1 - sensitivity)/specificity. 

“Ellipses indicate result is not available. 

“Actual cut points not specified in this study. 


from the physical examination in patients with suspected 
pneumonia according to results from the 4 previously identi¬ 
fied studies. LRs for the presence of any individual vital sign 
abnormality (LR+), including tachypnea, tachycardia, or 
fever, ranged from 2 to 4. Moreover, various cut points for 
these abnormalities did not have a substantial effect on the 
calculated LRs. 27 Similarly, LRs for the absence of any indi¬ 
vidual vital sign abnormality (LR-) ranged from 0.5 to 0.8. 
However, Gennis et al 27 demonstrated an LR- of 0.18 (95% 
Cl, 0.07-0.46) for the diagnosis of pneumonia according to 
the absence of all 3 vital sign abnormalities (ie, respiratory 
rate < 30/min, heart rate < 100/min, and temperature < 37.8°C 
[100°F]). According to this finding, if the baseline prevalence 
of pneumonia among ambulatory patients with respiratory 
illnesses is assumed to be 5%, a patient without any vital sign 
abnormalities would have a predicted probability of pneu¬ 
monia of less than 1%. 

The presence of several findings on chest examination 
significantly raised the likelihood of pneumonia. For exam¬ 
ple, in one study the presence of asymmetric respirations 
essentially guaranteed the diagnosis of pneumonia (LR+, 
95% Cl, 3.2-°°). 26 However the usefulness of this finding 
was limited because only 4% of patients with pneumonia 


had asymmetric respirations. The presence of other find¬ 
ings, including egophony and dullness to percussion, sig¬ 
nificantly increased the likelihood of pneumonia. However, 
given the low prevalence of pneumonia in the overall study 
populations, the effect of observing these findings on esti¬ 
mating the probability of pneumonia was only modest. For 
example, the presence of egophony had a positive predic¬ 
tive value ranging from as low as 20% 26 to no higher than 
5 6%. 27 

Finally, all 4 studies support the conclusion that the pres¬ 
ence or absence of crackles on examination would not be suf¬ 
ficient to rule in or rule out the diagnosis. For example, with 
a prevalence of pneumonia of 5%, the absence of crackles 
reduces the probability to 3%, at the lowest, and the presence 
of crackles increases the probability to 10%, at the highest. 
Moreover, the absence of any abnormality on chest examina¬ 
tion yielded an LR- of 0.57 (95% Cl, 0.39-0.83), 27 which is 
too close to the indeterminate LR value of 1.0 to substantially 
reduce the probability of pneumonia. 

The low accuracy of individual findings on chest examina¬ 
tion for detecting pneumonia has also been supported by 
studies that relied on retrospective data gathering 30,31 or 
incomplete application of chest radiography to all study 
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patients. 32 In one study, the absence of crackles yielded an 
LR- of only 0.71 (95% Cl, 0.47-0.90), and the absence of any 
abnormal auscultatory finding yielded an LR- of only 0.68 
(95% Cl, 0.44-0.89), both of which would translate into 
small effects on the probability of pneumonia. 32 In contrast, 
another study found that the absence of any abnormality on 
chest auscultation resulted in an LR- of 0.13 (95% Cl, 0.07- 
0.24), 31 which might substantially reduce the probability of 
pneumonia. However, this result has not been replicated in 
prospective studies, which would be subject to less bias in the 
recording of physical examination findings. 

Evaluating Algorithms to Predict Pneumonia 

Because the accuracy of individual symptoms or signs for 
predicting pneumonia is low, several studies developed pre¬ 
diction rules that incorporate the presence or absence of sev¬ 
eral medical history or physical examination findings. Table 
40-4 summarizes the features of 3 such rules. Though ini¬ 
tially designed as aids in the ordering of chest radiographs for 
patients with suspected pneumonia, they are reasonably con¬ 
sidered as prediction rules for the diagnosis of pneumonia in 
these patients and yield probabilities of pneumonia after 


Table 40-4 Predictive Rules for Pneumonia Diagnosed by 
Chest Radiography 3 

Diehr et al 26 


Add points when present* 1 

Rhinorrhea -2 Points 

Sore throat -1 Point 

Night sweats 1 Point 

Myalgias 1 Point 

Sputum all day 1 Point 

Respiratory rate > 25/min 2 Points 

Temperature > 37.8°C (100°F) 2 Points 


Singal et al 28 

Probability 0 = 1/(1 + e -v ) 

Y = -3.095 + 1.214 (Cough) + 1.007 (Fever) + 0.823 (Crackles) 

Each variable = 1 if present 

Heckerling et al 29 

Determine the number of findings present 8 
Absence of asthma 
Temperature > 37.8°C (100°F) 

Heart rate > 100/min 
Decreased breath sounds 
Crackles 

Abbreviations: LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“Adapted from Emerman et al. 33 

Tor example, a threshold score of -1 (ie, all patients with scores > -1 are consid¬ 
ered to have pneumonia) yields an LR+ of 1.5 and an LR- of 0.22; a threshold score 
of +1 yields an LR+ of 5.0 and an LR- of 0.47; and a threshold score of +3 yields 
an LR+ of 14.0 and an LR- of 0.82, according to the original study data. 26 
“First calculate Y and then calculate the predicted probability of pneumonia. 

Tor example, according to a prevalence of pneumonia of 5%, the presence of 0,1, 
2,3,4, or 5 findings yields probabilities of pneumonia of <1 %, 1 %, 3%, 10%, 25%, 
and 50%, respectively, according to a nomogram provided by Heckerling et al. 29 


completion of the clinical examination. For the rule by Diehr 
et al, 26 points are assigned for each clinical finding and 
summed to yield a discriminant score. For example, a thresh¬ 
old score of -1 (ie, all patients with scores > -1 are consid¬ 
ered to have pneumonia) yields an LR+ of 1.5 and an LR- of 
0.22, a threshold score of + 1 yields an LR+ of 5.0 and an LR- 
of 0.47, and a threshold score of +3 yields an LRt of 14 and 
an LR- of 0.82, according to the original study data. 26 The 
rule by Singal et al 28 is a logistic function that can yield prob¬ 
abilities of pneumonia, ranging from 4% (no findings 
present) to 49% (all 3 findings present). 28 

The final prediction rule, by Heckerling et al, 29 is based on 
the presence or absence of 5 clinical findings. The perfor¬ 
mance of this prediction rule depends on the pretest proba¬ 
bility of pneumonia in the population. In most ambulatory 
care settings, this probability will be relatively low. For exam¬ 
ple, as observed earlier, in a national survey, only 5% of all 
patients visiting primary care physicians for cough were 
diagnosed as having pneumonia. 1 In this setting, the presence 
of 2, 3, or 4 predictors would result in predicted probabilities 
of pneumonia of 3%, 10%, or 25%, respectively, according to 
a nomogram provided by Heckerling et al. 29 The rule would 
yield a maximum probability of pneumonia of 50% if all 5 of 
its clinical predictors were present. These findings emphasize 
the inaccuracy in diagnosing pneumonia clinically, in the 
absence of confirmatory chest radiography. 

The 3 scores summarized in Table 40-4, along with the deci¬ 
sion rule suggested by Gennis et al 27 (ie, obtaining chest radio¬ 
graphs only for patients suspected of having pneumonia, with 
at least 1 vital sign abnormality), were compared for their abil¬ 
ity to predict correctly the results of chest radiography in an 
independent study by Emerman et al. 33 Patients presenting to 
an emergency department or outpatient medical clinic with a 
complaint of cough were enrolled prospectively, and chest 
radiographs were obtained for all patients regardless of the pri¬ 
mary physician’s clinical impression. 

Overall, the prevalence of pneumonia among the study 
patients was 7%. In the absence of an explicit guideline, phy¬ 
sician judgment that the patient did not need chest radiogra¬ 
phy reduced the probability of pneumonia to just less than 
2% (LR-, 0.25; 95% Cl, 0.09-0.61), which exceeded all 4 pre¬ 
diction rules. In contrast, physician judgment that the 
patient needed chest radiography to diagnose pneumonia 
increased the probability of pneumonia to only 13% (LR+, 
2.0; 95% Cl, 1.5-2.4), which meant that reliance on implicit 
physician judgment alone would have led to many unneces¬ 
sary chest radiographs. 

In comparison, the simple decision rule by Gennis et al 27 
ordering chest radiographs only for patients with abnormal 
vital signs yielded the highest overall LR+ for predicting 
pneumonia, but the LR+ was a modest 2.6 (95% Cl, 1.6-3.7). 
With this rule, 40% fewer radiographs would have been 
ordered compared with unaided physician judgment. How¬ 
ever, excluding pneumonia according to the absence of any 
vital sign abnormalities would have missed 38% of patients 
subsequently shown to have pneumonia on chest radiogra¬ 
phy (LR-, 0.50 [95% Cl, 0.27-0.78], compared with LR-, 
0.18 in the original study by Gennis et al 27 ). 
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An algorithm that is less than perfect, that is, not all 
ordered chest radiographs demonstrate a new infiltrate, will 
still be acceptable, given the relatively low cost and risk 
associated with this test. Ultimately, optimum yields for 
chest radiography in the evaluation of patients with sus¬ 
pected pneumonia will need to be determined, balancing 
the costs of the test with the costs of missed diagnoses. 
Additional factors, such as illness severity and patient pref¬ 
erences, will also play a role in determining the appropriate 
threshold for ordering chest radiographs for patients with 
acute respiratory illnesses. For example, thresholds may be 
lower for patients who appear severely ill or who express 
strong desires to have a definitive diagnosis. We suggest that 
an algorithm that yields less than a 100% negative predic¬ 
tive value may still be acceptable, assuming that patients 
with missed cases of pneumonia continue to have good 
clinical outcomes. However, this hypothesis will need to be 
tested. 


CLINICAL SCENARIO—RESOLUTION 


The patient presents with typical symptoms of community- 
acquired pneumonia, including a productive cough and 
fever. Physical examination reveals fever and crackles on 
chest auscultation. In particular, the patient has 4 of the 5 
clinical pneumonia predictors identified by Heckerling et 
al 29 (absence of asthma, presence of fever, tachycardia, and 
crackles). With the nomogram by Heckerling et al, 29 a 5% 
prevalence of pneumonia among outpatients yields a 25% 
probability of pneumonia. Similarly, the patient is at the 
threshold score of +3 points on the prediction rule by Diehr 
et al 26 (presence of sore throat, sputum, myalgias, and 
fever), yielding an LR for pneumonia of 14 (according to 
the original study data) and a calculated probability of 
pneumonia of 42%. Finally, the patient has all 3 criteria of 
Singal et al, 28 yielding a probability of pneumonia of 49%, 
according to their logistic formula. We conclude that none 
of these combinations of findings can be said to “rule in” 
the diagnosis, yet the possibility of pneumonia remains 
high enough that further diagnostic testing, in particular 
chest radiography, is warranted. 


THE BOTTOM LINE 

Physicians frequently disagree about the presence or 
absence of individual findings on chest examinations of 
patients with respiratory illnesses, including community- 
acquired pneumonia. 

Individual symptoms and signs have inadequate test char¬ 
acteristics to rule in or rule out the diagnosis of pneumonia. 
Decision rules that use the presence or absence of several 
symptoms and signs to modify the probability of pneumonia 
are available, the simplest of which requires the absence of 
any vital sign abnormalities to exclude the diagnosis. There 
are no combinations of medical history and physical exami¬ 
nation findings that confirm the diagnosis of pneumonia. If 
diagnostic certainty is required in the treatment of a patient 


with suspected pneumonia, then chest radiography should 
be performed. 

Future research should examine ways to improve the preci¬ 
sion of the clinical examination in patients with suspected 
pneumonia, as well as to determine the accuracy of the clini¬ 
cal examination in these patients in settings outside the 
emergency department. In addition, studies should address 
appropriate thresholds for obtaining chest radiographs and 
treating accordingly vs empirical antimicrobial therapy vs 
clinical observation in the treatment of patients with sus¬ 
pected community-acquired pneumonia. 
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CLINICAL SCENARIO 


A 36-year-old man with no underlying medical illness 
developed a cough 3 days ago. In the past 24 hours, his 
cough became productive of darkened sputum and he 
observed some wheezing for the first time. He decided to 
try to go to work, but an episode of chills made him realize 
he needed to see an urgent care physician. On examina¬ 
tion, you find that his temperature is 38.2°C. He does not 
have tachypnea or tachycardia, although he is wheezing. 
You do not hear any areas of decreased breath sounds or 
pulmonary rales. On hearing the wheezing, you inquire to 
find that he has no history of asthma and that he is not a 
smoker. 


UPDATED SUMMARY ON ADULT COMMUNITY- 
ACQUIRED PNEUMONIA 

Original Review 

Metlay JP, Kapoor, Fine WN. Does this patient have community- 
acquired pneumonia? diagnosing pneumonia by history and 
physical examination. JAMA. 1997;278( 17):1440-1445. 

UPDATED LITERATURE SEARCH 

Our literature search combined the search terms “community- 
acquired infections or pneumonia” with the parent search 
strategy for The Rational Clinical Examination series, “meta¬ 
analysis,” or “roc curve,” limited to English-language publica¬ 
tions in the MEDLINE database from 1995 to November 
2004. The search strategy yielded 162 articles. We searched 
the abstracts for articles that used prospective data collection 
of clinical variables and compared them with a chest radio¬ 
graph reference standard. Two categories of articles were 
identified: (1) those that evaluated the clinical examination 
for its ability to identify patients with community-acquired 
pneumonia, and (2) those that used the clinical examination 
to establish a prognosis for individuals with community- 
acquired pneumonia. To explore the possibility that we might 
be missing articles, we entered the original Rational Clinical 
Examination article into Citation Index (ISI Web of Knowl¬ 


edge, Science Citation Index Expanded) to capture any arti¬ 
cles not identified in the MEDLINE search. 

A systematic review of the diagnosis of community- 
acquired pneumonia updated the previous Rational Clinical 
Examination article for new information published up to 
December 2000. 1 For clinical diagnosis, we used that system¬ 
atic review and articles identified in the literature search pub¬ 
lished since 2000. We did not update the information on 
pneumonia care guidelines and risk stratification. We identi¬ 
fied only 1 new article assessing the clinical examination for 
clinical diagnosis that was not cited in the systematic review. 

NEW FINDINGS 

• Approximately 1 of every 10 patients who are sick enough 
to be admitted with a clinical diagnosis of community- 
acquired pneumonia, but who have a normal initial chest 
radiograph result, will develop radiographic evidence of 
pneumonia by 72 hours. 

• Among patients admitted to the hospital, an acute onset of 
illness is more likely observed in pneumonia caused by pyo¬ 
genic bacteria (streptococcal, staphylococcal, or Entero- 
bacteriaceae), but the clinical examination alone should not 
be used to select a patient’s antibiotic coverage. 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

No new data change the results for the 3 predictive models 
for community-acquired pneumonia, displayed in Table 40-4 
of the original publication. The predictive models are redis¬ 
played in Tables 40-5 and 40-6 to show the LR or probability 
of pneumonia associated with each. 

CHANGES IN THE REFERENCE STANDARD 

The pragmatic reference standard for usual clinical care is the 
chest radiograph. However, some patients will initially have a 
normal chest radiograph result in the early course of their ill¬ 
ness. One study compared patients admitted with a clinical 
diagnosis of pneumonia and an abnormal radiograph result 
vs those with the clinical diagnosis and no radiographic 
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pneumonia shown by the initial radiograph result. 2 By 72 
hours, a random sample of those admitted with a normal 
radiograph result showed that 7% (95% confidence interval 
[Cl], 3%-13%) had developed pneumonia. High-resolution 
chest computed tomography (CT) picks up opacification or 
consolidation not observed on conventional chest radio¬ 
graphs. However, the chest CT has not been as extensively 
validated with microbiologic results as chest radiographs. 1 

RESULTS OF LITERATURE REVIEW 

A study with careful microbiologic characterization of 
community-acquired pneumonia for patients admitted to the 
hospital (75% of patients with pneumonia in the patient sam¬ 
ple) showed that the finding of “acute” onset was the only symp¬ 
tom with a statistically significant diagnostic odds ratio (31) for 


Table 40-5 Multivariate Findings for Adult Pneumonia 
Diehr et al 4 

Add points for the presence of findings as follows: 

rhinorrhea = -2 points; sore throat = -1; night sweats = 1; myalgias = 1; sputum 

all day = 1; respiratory rate > 25/min = 2; temperature > 37.8°C (100°F) = 2 


Threshold Score LR 

>3 TT 

>1 5fT 

>-1 TIT 

<^1 022 


Singal et al 5 

Score = -3.095 +1.214 x (cough) +1.007 x (fever) + 0.823 x (crackles) 
Each variable is coded as 1 if present, 0 if absent 
Probability of pneumonia = 1/(1 + e™) 

Abbreviation: LR, likelihood ratio. 


Table 40-6 As the Number of Findings Increases, the Probability of 
Pneumonia Increases 

Heckerling et al 6 

Count the number of findings present: absence of asthma; temperature 
> 37.8°C (100°F); heart rate > 100/min; decreased breath sounds; crackles 

Findings Present Probability, % (Baseline Prevalence 5%) 

5 50 

4 25 

3 20 

2 3 

1 1 

0 


pneumonia caused by pyogenic organisms. 3 When the patient 
has acute onset, the positive likelihood ratio (LR) is 3.6; when 
the onset of the patients illness is not acute, the negative LR is 
0.31 and makes infection caused by atypical bacteria or viral ill¬ 
ness more likely. However, because current guidelines for the 
treatment of community-acquired pneumonia in adults always 
include antibiotics for pyogenic bacteria, the results of the clini¬ 
cal examination should not be used to select the antibiotic. 

In the model of Diehr et al, 4 the score is calculated based 
on the clinical findings and the LR depends on the threshold 
that you want to consider positive. At scores of 1 or higher, 
the likelihood of pneumonia increases (Table 40-5). The Sin¬ 
gal et al 5 model is a logistic function (Table 40-5). Once the 
findings are recorded and the score calculated, the probabil¬ 
ity of adult pneumonia can be derived. As the number of 
findings increases for the Heckerling et al 6 model, the proba¬ 
bility of disease increases (Table 40-6); access to a nomogram 
is required, which makes this less practical to use. Nonethe¬ 
less, the findings in all the models overlap and the physician 
can appropriately deduce that increased numbers of findings 
in these models make pneumonia more likely. 

EVIDENCE FROM GUIDELINES 

No new federal agency recommendations address the diag¬ 
nosis of community-acquired pneumonia. The role of the 
Pneumonia Severity Index for prognostication has been 
summarized and supported for treatment decisions. 7 


CLINICAL SCENARIO—RESOLUTION 


The patient’s presenting symptoms and signs require clin¬ 
ical judgment to decide whether to obtain a chest radio¬ 
graph. The new onset of wheezing might complicate your 
decision making because wheezing does not enter any of 
the predictive models. He has a cough with fever, so that 
the Singal et al 5 model gives him a 29% probability of 
pneumonia. However, the Heckerling et al 6 model shows 
the presence of only 2 findings that give him a probability 
of only 3% (absence of asthma and fever). Your judgment 
should consider the epidemiology of acute cough illness 
in your community (eg, are you in the middle of influenza 
season?), the clinical gestalt of how ill he appears, and the 
ability for him to return to consult you should he sud¬ 
denly worsen. He does not appear ill enough for hospital 
admission, and he should do well with outpatient man¬ 
agement. You think influenza is the most likely diagnosis, 
but the episode of chills concerns you. If you wish to treat 
this patient with antibiotics for pneumonia, a chest radio¬ 
graph is required. 
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COMMUNITY-ACQUIRED PNEUMONIA, ADULT—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Using cough as a requirement for considering pneumonia, the 
baseline probability of radiographic-proven pneumonia in 
patients with acute cough illness is about 5%. 

POPULATION FOR WHOM COMMUNITY-ACQUIRED 
PNEUMONIA SHOULD BE CONSIDERED 

• Patients with symptoms of acute respiratory illness, pri¬ 
marily cough. 

• Patients with comorbid illnesses, older patients, and those 
with immunocompromised status have a much higher 
risk for community-acquired pneumonia. 

DETECTING THE LIKELIHOOD OF COMMUNITY- 
ACQUIRED PNEUMONIA 

Individual clinical symptoms or signs have low utility for 
identifying patients with pneumonia. Combinations of 
findings are required, including cough, fever, tachypnea, 
and abnormalities on auscultation (decreased breath 
sounds or crackles). The clinical decision that a patient has 
a low enough likelihood of pneumonia that a chest radio¬ 
graph is not required lowers the probability of pneumonia 
to less than 5%. Rather than recommending one particular 


prediction model over the other for selecting patients who 
should have a chest radiograph, clinicians should use their 
own clinical judgment and the presence of increasing num¬ 
bers of clinical signs and symptoms from the prediction 
models. The detection of pneumonia requires a chest radio¬ 
graph, and the presence of appropriate findings on the chest 
radiographs is part of the case definition for pneumonia. 

REFERENCE STANDARD TESTS 

There is no practical reference standard test that allows 
correct categorization of the patient who has a pulmonary 
infection that will respond to antibiotics vs those that do 
not need antibiotics. The reference standard for pneumo¬ 
nia is the identification of a microbiologic pathogen from 
lung tissue. Because this is infrequently obtained, the prag¬ 
matic reference standard is the combination of clinical 
findings with appropriate abnormalities on a chest radio¬ 
graph. A follow-up chest radiograph is often required to 
demonstrate improvement of the initial findings consistent 
with pneumonia or to identify findings not present on the 
first radiograph. The role of high-resolution CT for 
patients with a nondiagnostic initial chest radiograph 
result requires studies comparing the results to microbio¬ 
logic outcomes. 
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EVIDENCE TO SUPPORT THE UPDATE: 
Community-Acquired Pneumonia, Adult 



TITLE Testing Strategies in the Initial Management of 
Patients With Community-Acquired Pneumonia. 

AUTHORS Metlay JP, Fine MJ. 

CITATION Ann Intern Med. 2003;138(2):109-118. 

QUESTIONS Do clinical findings allow the physician 
to establish the diagnosis of community-acquired pneu¬ 
monia? Once a chest radiograph confirms community- 
acquired pneumonia, do clinical and laboratory results 
allow identification of individuals who can be safely 
treated as outpatients? 

DESIGN Formal systematic review. 

DATA SOURCES MEDLINE database. 


STUDY SELECTION AND ASSESSMENT 

The authors used the same selection criteria as that in the origi¬ 
nal Rational Clinical Examination article on adult pneumonia: 
published from January 1996 to December 2000, English lan¬ 
guage, but excluding studies of children or inpatients. Studies 
had to report the reference standard (chest radiography) for all 
patients suspected of having community-acquired pneumonia. 
The references from retrieved articles were reviewed. 

For studies of short-term prognosis and the need for hos¬ 
pitalization, the authors updated a previous meta-analysis. 1 
The search strategy excluded nosocomial, nursing home- 
acquired, noninfectious, pediatric, and human immunodefi¬ 
ciency virus-associated pneumonia. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

A chest radiograph was required as the reference standard. 

OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratios (LR) of clinical 
findings for pneumonia. Ranges for the LR were reported 
rather than summary measures. 


MAIN RESULTS 

The authors found no additional clinical diagnosis of pneumo¬ 
nia studies not referenced in the original Rational Clinical 
Examination article. The authors found 134 cohort studies of 
the short-term prognosis of community-acquired pneumonia. 

Once pneumonia is established (by clinical examination 
and chest radiograph), the results of the medical history, 
physical examination, and laboratory tests are used to 
establish a prognosis. The Pneumonia Patient Outcomes 
Research Team Severity Index (PSI) accurately identified 
low-risk patients who could be treated as outpatients for 
community-acquired pneumonia. 2 Although the PSI allows 
risk stratification for all patients with pneumonia, its pri¬ 
mary purpose was the accurate identification of low-risk 
patients. For those not in the lowest risk class, the clinical 
history, physical examination, and laboratory findings 
should be used to establish a risk class. The following clini¬ 
cal variables predict a poorer prognosis, so they should be 
the focus of the evaluation 2 : 

Demographic variables: 

1. Age 

2. Male 

3. Nursing home residence 
Comorbid illness: 

1. Neoplastic disease 

2. Liver disease 

3. Congestive heart failure 

4. Cerebrovascular disease 

5. Renal disease 

Physical examination findings: 

1. Altered mental status 

2. Respiratory rate greater than or equal to 30/min 

3. Systolic blood pressure less than 90 mm Hg 

4. Temperature less than 35°C or greater than or equal to 40°C 

5. Pulse greater than or equal to 125/min 

In addition to these variables, several common laboratory 
tests further modify the clinical variables. 

Laboratory findings: 

1. pH less than 7.35 

2. Blood urea nitrogen level greater than 64.5 mg/dL 
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3. Sodium level less than 130 mEq/L 

4. Glucose level greater than 250 mg/dL 

5. Hematocrit level less than 30% 

6. Po 2 less than 60 mm Hg (or oxygen saturation < 90%) 

7. Pleural effusion 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS Formal systematic review that used the same 
methods as the original article in The Rational Clinical 
Examination series on adult pneumonia. 

LIMITATIONS None. 

As of 2001, no additional data on the clinical findings for 
the diagnosis of adult pneumonia were identified. 

The PSI was selected as a validated prognostic model with 
the highest methodologic criteria. 3 

Reviewed by David L. Simel, MD, MHS 

REFERENCES FOR THE EVIDENCE 

1. Fine MJ, Smith MA, Carson CA, et al. Prognosis and outcomes of 
patients with community-acquired pneumonia: a meta-analysis. JAMA. 
1996;275(2): 134-141. 

2. Fine MJ, Auble TE, Yealy DM, et al. A prediction rule to identify low-risk 
patients with community-acquired pneumonia. N Engl J Med. 1997;336 
(4):243-250. 

3. Auble TE, Realy DM, Fine MJ. Assessing prognosis and selecting an ini¬ 
tial site of care for adults with community-acquired pneumonia. Infect 
Dis Clin North Am. 1998;12(3):741-759. 


TITLE Community-Acquired Pneumonia: Develop¬ 
ment of a Bedside Predictive Model and Scoring System to 
Identify the Aetiology. 

AUTHORS Ruiz-Gonzalez A, Falguera M, Vives M, 
Nogues A, Porcel JM, Rubio-Caballero M. 

CITATION Resp Med. 2000;94(5):505-510. 

QUESTION Among patients admitted to the hospital 
with a clinical diagnosis of community-acquired pneu¬ 
monia, do clinical features identify those likely to have 
bacterial pneumonia from pyogenic organisms? 

DESIGN Prospective, consecutive enrollment with clin¬ 
ical data recorded before microbiologic results obtained. 

SETTING University hospital in Spain. 

PATIENTS Patients older than 14 years, admitted during 
a 15-month period, with a principal diagnosis of pneumo¬ 
nia. Patients admitted within the previous 7 days or trans¬ 
ferred to the hospital were excluded. Of the potentially 
eligible patients with pneumonia treated in the emergency 
department, 75% were admitted to the hospital. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Clinical data were recorded prospectively by the examining 
clinician. The criteria for inpatient vs outpatient treatment 
were not specified. The microbiologic reference standard was 
used to classify patients with bacterial pneumonia caused by 
Streptococcus pneumoniae and other pyogenic bacteria (strep¬ 
tococci, Haemophilus influenzae, Staphylococcus aureus, 
Enterobacteriaceae) vs atypical pneumonia ( Mycoplasma 
pneumoniae, Chlamydia pneumoniae, Chlamydophila psittaci, 
Coxiella burnetii, Legionella pneumophila, or virus) vs pneu¬ 
monia of unknown etiology. The microbiologic diagnosis was 
based on blood culture results, microbiologic results, or 
polymerase chain reaction test results on samples from trans¬ 
thoracic needle aspiration of the lung in patients without con¬ 
traindications to the procedure, sputum culture for legionella, 
or 4-fold titer increase for atypical organism or virus. 


MAIN OUTCOME MEASURES 

A total of 85 patients (82%) of the sample had an etiologic 
diagnosis with the microbiologic standards. A logistic model 
was created to see which clinical variables predicted pyogenic 
bacterial pneumonia (see les 40-7 and 40-8). 
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MAIN RESULTS 

The diagnostic odds ratio (OR) for acute onset (OR, 31; 95% 
confidence interval [Cl], 6-150) and age greater than 65 years 
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Table 40-7 Likelihood Ratios for Pyogenic Bacterial Pneumonia 

Test 

LR+ (95% Cl) 

LR- (95% Cl) 

Age > 65 y or comorbidity 3 

2.7(1.6-4.6) 

0.43(0.26-0.71) 

Acute onset 

3.6 (2.0-6.5) 

0.31 (0.17-0.55) 

Chills 

1.60 (0.73-3.40) 

0.86(0.67-1.10) 

Pleuritic chest pain 

1.40 (0.97-2.00) 

0.62(0.36-1.10) 

Purulent sputum 

1.20 (0.56-2.50) 

0.95 (0.74-1.20) 

Signs of consolidation on ausculta¬ 
tion 3 

1.10(0.79-1.50) 

0.86(0.48-1.60) 

Leukocytosis or leukopenia 15 

2.0(1.3-2.8) 

0.32(0.16-0.66) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“The specific comorbidities or signs of consolidation were not described. 
“Leukocytosis defined as white blood cell count > 11000/pL and leukopenia defined as 
white blood cell count < 4000/pL. 


or comorbidity (OR, 6.9; 95% Cl, 2-23) were the only find¬ 
ings that did not include 1 in the OR Cl. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Microbiologic proof of infection in most 
patients, including results from lung parenchyma samples. 

LIMITATIONS Study includes only patients admitted to the 
hospital. Radiographic results not provided. Patient popula¬ 
tion not well described in terms of comorbid illness. 


Table 40-8 Likelihood Ratio of Bacterial Pneumonia From a 
Scoring System 


Test 

Sensitivity 
(95% Cl) 

Specificity 
(95% Cl) 

LR+ 3 

LR- 3 

Bacterial 

0.89 (0.78-0.96) 

0.63 (0.54-0.81) 

2.4 

0.17 

pneumonia 
score > 5 b 






Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“The LRs are estimated from the sensitivity and specificity. CIs cannot be calculated 
without the raw values. 

“Age > 65 years or comorbidity = 3 points; acute onset = 5 points; leukocytosis or 
leukopenia = 2 points. 


Among admitted patients with community-acquired pneu¬ 
monia, an acute onset of disease is the variable that most 
increases the likelihood of bacterial pneumonia attributed to 
pyogenic bacteria. An onset that is not acute decreases the like¬ 
lihood of pyogenic bacterial pneumonia the most. The lack of 
significance (diagnostic OR not statistically different from 1) 
for chills, pleurisy, purulent sputum, and auscultatory signs of 
consolidation is also important. 

We do not know whether the results can be applied to the 
25% of patients receiving ambulatory treatment, because 
those patients did not have the same microbiologic studies. 

Reviewed by David L. Simel, MD, MHS 
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A mother brings her 8-month-old infant to your office in 
midwinter with a cough. She reports that the illness began 4 
days ago with a runny nose. Two days ago, the baby devel¬ 
oped a fever. Now the baby’s symptoms are getting worse. 
The baby has become more irritable, is eating less, and 
seems to be having more difficulty breathing. This is the 
third child you have treated today with a cough. While the 
first two children were treated for acute upper respiratory 
tract infections, you wonder if the findings in this infant 
suggest pneumonia. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Acute respiratory illnesses are among the most common condi¬ 
tions of infants treated in primary care offices. Although the 
majority of respiratory illnesses involve infections of the upper 
respiratory tract, most infants will experience a lower respira¬ 
tory tract illness (LRI) in the first year of life. Of those with LRIs, 
about 30% visit a physician, 1 - 2 and about 2% are hospitalized. 3 

LRIs can be defined simply as infections at an anatomic level 
below the vocal cords. The majority of LRIs in infants are 
caused by viruses; only a small proportion is due to bacteria. 
The differential diagnosis for cough is long (Table 41-1 ). 

Therapies are available to treat a variety of manifestations of 
lower respiratory tract disease, so it is important to diagnose 
these complaints accurately and estimate their severity to 
deliver the appropriate treatment. Identifying infants at lower 
risk of bacterial disease may help clinicians avoid the unneces¬ 
sary use of antibiotics, which may reduce the risk of subse¬ 
quent bacterial infection and slow the emergence of resistant 
strains of bacteria within the population. 4 Greater certainty 
about the presence of a viral LRI may also help clinicians avoid 
additional testing such as radiography or blood culture. 

This overview focuses on the medical history and physical 
examination findings of infants that distinguish pneumonia 
from other LRIs. 

METHODS 

We conducted a MEDLINE search from 1982 to 1995 to iden¬ 
tify articles about the diagnosis of pneumonia in children. We 
searched for articles with any of the following Medical Subject 
Eleading terms: “pneumonia,” “diagnostic tests,” “sensitivity 
and specificity,” “reproducibility of results,” “physical examina¬ 
tion,” or “medical history taking.” This search was further lim¬ 
ited to studies published in English about humans and that 
involved children. This search strategy identified 38 articles. 
Four more articles were identified by reviewing a compen¬ 
dium of references prepared by the World Health Organiza¬ 
tion. 5 Etiologic studies, which did not include a chest 
f use. 
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Table 41-1 Differential Diagnosis of Cough in Infants 

Anatomic 
Foreign body 

Congenital malformation (eg, vascular ring, cystic adenomatous malfor¬ 
mation, bronchogenic cyst, tracheomalacia) 

Inflammatory 
Reactive airway disease 
Infectious 
Viral 
Croup 

Laryngotracheobronchitis 
Bronchitis 
Viral pneumonia 
Bacterial 
Epiglottis 
Tracheitis 
Bronchitis 

Bacterial pneumonia 
Chlamydia 
Tuberculosis 

Other 

Cystic fibrosis 
Congestive heart failure 
Gastroesophageal reflux 


radiograph examination as part of the gold standard, involved 
only inpatients. Studies of illness in families’ homes, rather 
than in clinical settings, were excluded (n = 29). 

All the articles were reviewed by the authors, and disagree¬ 
ments were resolved by discussion. We used the methods 
developed for this series to assess the quality of the articles. 
The highest-quality studies are emphasized in the “Results” 
section. We did not aggregate statistically the results of the 
studies because of differences in the ages of the study samples 
and differences in cutoff points of key variables, such as res¬ 
piratory rate. Confidence intervals (CIs) were calculated 
according to the method suggested by Koopman 6 and Centor 
and Keightley. 7 

Reference Standard for Diagnosing Pneumonia 

The reference standard for diagnosing pneumonia is an aspi¬ 
rate from the lower respiratory tract obtained by bronchoal- 
veolar lavage or lung puncture. The use of bronchial lavage is 
appropriate in guiding antibiotic choice in patients with 
refractory or complicated pneumonia. In general practice, 
chest radiographs are readily obtained and can be considered 
a pragmatic reference standard for pneumonia. 

A number of studies evaluated the accuracy of the chest 
radiograph in differentiating viral from bacterial disease in 
children. 813 It is difficult to determine the accuracy of the 
chest radiograph from these studies because of methodologic 
limitations, as well as problem with study design introduced 
by the biology of pneumonia. It is not possible to obtain cul¬ 
tures from a lung in most patients. Therefore, investigators 


have had to use combinations of other clinical features as a 
proxy for bacterial pneumonia. Reliance on less than perfect 
gold standards for diagnosing bacterial pneumonia may pro¬ 
duce over- or underestimates of the association of a positive 
chest radiographic finding with bacterial pneumonia. Two 
studies used the same definition of bacterial pneumonia 
(duration of symptoms < 2 days, temperature > 39.5°C, total 
white blood cell count > 15000/pL). 8 ’ 9 Both found the sensi¬ 
tivity of the chest radiograph for diagnosing bacterial vs viral 
pneumonia to be approximately 75%. However, one reported 
a specificity of 100%; the other, a specificity of 63%. The 
reported sensitivity for studies with varying definitions 
ranges from 42% to 80% and the specificity from 42% to 
100%. Studies of the accuracy of chest radiographs have also 
been compromised by other methodologic problems, such as 
interobserver variability in the interpretation of the radio¬ 
graph, oversampling patients with relatively severe disease, 
and the relatively small numbers of patients with bacterial 
pneumonia. Such problems make estimates of chest radio- 
graphic accuracy unreliable. 

Variation in the biologic manifestations of bacterial pneu¬ 
monia also presents challenges in the interpretation of pub¬ 
lished studies. For example, bacterial pneumonia is classically 
associated with lobar consolidation on the radiograph. How¬ 
ever, studies report that bacterial pneumonia may be associ¬ 
ated with infiltrates that are lobar, perihilar, segmental, 
interstitial, or nodular infiltrates. 1416 Consolidation can also be 
observed with viral pneumonia, but it is unclear whether this 
radiologic appearance is due to segmental consolidation, 
atelectasis, or bacterial coinfection. Such variability in the 
radiographic appearance of bacterial pneumonia may produce 
over- or underestimates of the association of a positive chest 
radiographic finding with bacterial pneumonia. 

Clinicians should be aware that the chest radiographic 
results may be negative in patients with early bacterial pneu¬ 
monia. 17 The sensitivity of the chest radiograph will be 
reduced in this group. The implications of this observation are 
important for studies of the clinical examination. For the pur¬ 
poses of this systematic review, we included studies that used 
the chest radiograph as the reference standard. Studies that 
combined the clinical diagnosis with the chest radiographic 
results as the reference standard were excluded because inclu¬ 
sion of the diagnostic test in the reference standard may over¬ 
estimate the accuracy of clinical findings. The significance of 
clinical findings of pneumonia in the absence of a positive 
chest radiographic findings remains to be studied. 

Normal Anatomy and Pathophysiology of Pneumonia 

Lower respiratory tract infections occur at or below the lar¬ 
ynx and include epiglottitis, laryngitis, laryngotracheobron¬ 
chitis (croup), bronchiolitis, and pneumonia (Figure 411). 
Pneumonia typically follows an upper respiratory tract ill¬ 
ness in which the lower respiratory tract is invaded by bacte¬ 
ria, viruses, or other pathogens that trigger the immune 
response and produce inflammation. Histamines, leuko- 
trienes, and chemotactic factors are released that recruit 
white blood cells to the area. This response fills the air spaces 
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of the lower respiratory tract with white blood cells, fluid, 
and cellular debris. This process reduces lung compliance, 
increases resistance, obstructs smaller airways, and possibly 
results in collapse of distal air spaces. 

The resultant physical findings vary with the site of infection, 
ranging from coarse breath sounds or rhonchi in broncho¬ 
pneumonia to crackles in the alveoli in cases of pneumonia or 
bronchiolitis. Crackles are the result of the explosive equaliza¬ 
tion of gas pressure between the terminal bronchiole and the 
alveoli. 18 Wheezes result from the oscillation of air through a 
narrowed airway that produces a musical sound likened to a 
vibrating reed. 19 Decreased breath sounds may also be observed 
in areas of consolidation. 

How to Elicit the Relevant Symptoms and Signs 

The physician’s first goal when taking the medical history and 
performing the physical examination in a child who presents 
with a cough is identification of the clinical syndrome and level 
of involvement, as shown in Figure 41-1. The second goal is to 
estimate the severity of the illness. The physician should ask the 
parent about symptoms associated with pneumonia, as well as 
those that may discriminate pneumonia from other lower respi¬ 
ratory tract diseases. In addition to cough, symptoms that may 
increase the likelihood of pneumonia include trouble breathing, 
rattling in the chest, noisy breathing, trouble feeding, fever, 
rapid breathing, anxiety, or restlessness. Clinicians working in 
different regions or with different cultures need to familiarize 
themselves with local terminology for lower respiratory tract 
symptoms. It may also be useful to ask about previous episodes 
of these chest symptoms because recurrent bouts of pneumonia 
or bronchitis may suggest reactive airway disease. In early 
infancy (<2 months), infants of mothers who had chlamydia 
during pregnancy may develop afebrile pneumonia. Infants 
only rarely produce sputum. In older infants, foreign body 
ingestion and salicylate poisoning should be considered. 
Although clinical experience suggests that the history of pneu¬ 
monia may be of acute or gradual onset and that bacterial pneu¬ 
monia tends to be associated with fever, we were unable to find 
any studies substantiating these observations. 

The physical examination should include an assessment of the 
child’s general appearance, measurement of the respiratory rate, 
evaluation of the work of breathing, and auscultation of the 
chest. The child’s general appearance may provide important 
clues about the presence of bacterial illness and its severity. 
Infants can exhibit a wide range of behaviors and mood changes 
during the parental interview, while being undressed, and dur¬ 
ing the physical examination. Therefore, it is important to take a 
nonthreatening approach with the young child. Infants should 
be observed initially at a distance, while they are comfortable, 
usually in the caretaker’s lap. The assessment of general appear¬ 
ance should include an evaluation of a number of factors: atten¬ 
tiveness to the environment, ability to breast-feed or drink, 
ability to sustain sucking, vocalization, smiling, movement, 
color, and consolability. If there is uncertainty about particular 
findings, it may be helpful to try to elicit them; for example, 
encouraging the child to smile, having the mother offer the 
breast, or showing the child a toy to engage his or her attention. 



Figure 41 -1 Clinical Syndromes of Acute Respiratory Infections 

Acute upper respiratory tract infections include cold, otitis media, and phar¬ 
yngitis, which are all located above the dashed line in the figure. Acute lower 
respiratory tract infections causing stridor include epiglottitis, laryngitis, and 
laryngotracheitis. Anatomically, lower respiratory tract infections include 
bronchitis, bronchiolitis, and pneumonia. 


Respiratory rates change considerably in the first year of life, 
decreasing from a mean in awake babies of about 50/min at 1 
week of age to about 40/min at 6 months of age. 20 ' 23 The respi¬ 
ratory rate in children can also vary during brief intervals as 
the child’s level of interest in the environment changes or while 
the child is asleep or feeding. 24 Polygraphic studies of infants 
younger than 6 months have demonstrated that mean respira¬ 
tory rates were 4/min to 13/min higher in active sleep (rapid 
eye movement) than in quiet sleep. 23 - 25 Fever can also increase 
an infant’s respiratory rate by 10/min per degree centigrade in 
children without pneumonia. 26 However, the effects of fever in 
the presence of pneumonia have not been studied. 

The respiratory rate is best measured by observing chest wall 
movements during 1 minute. 27 ' 29 Listening to the chest with a 
stethoscope may stimulate the child and lead to a falsely ele¬ 
vated measurement. Measurement errors in counting the res¬ 
piratory rate are greater when children are agitated or crying 
compared with when they are calm, feeding, or sleeping. The 
examiner should count the respiratory rate before conducting 
other parts of the examination. Respiratory rate cutoffs that 
are commonly used to indicate an elevated rate are greater 
than 60/min in infants younger than 2 months, greater than 
50/min in infants 2 through 12 months of age, and greater 
than 40/min in children older than 12 months. 30 

Assessing an infant’s work of breathing is important to 
estimate the severity of LRI. This assessment includes evalua¬ 
tion of chest wall movements, nasal flaring, and grunting. 
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Chest wall movements include retractions or chest indraw¬ 
ing, best observed with the chest fully exposed. Supraclavicu¬ 
lar retractions may be observed as indrawing of the soft 
tissue above the clavicle or above the sternal notch. Intercos¬ 
tal retractions are seen as indrawing of skin between the ribs. 
Subcostal retractions occur on or just below the costal mar¬ 
gin. Many experts suggest that these types of retractions, 
involving only the soft tissue, should be distinguished from 
chest wall indrawing, defined as an inward movement of the 
lower chest wall (ie, ribs) when the child breathes in. Chest 
indrawing is more likely to be observed in infants younger 
than 18 to 24 months, whose chest walls are more pliant. The 
finding may be appreciated best by viewing the chest laterally 
and looking for indrawing of the ribs or lower sternum with 
inspiration, relative to a fixed point beyond the child’s chest 
that is set as a mental reference point (Figure 41-2). Nor¬ 
mally, the costal margin moves little during quiet breathing. 
If it does, it moves up and outward because the normal dia¬ 
phragm lifts the costal margin outward. In disease states, the 
depressed diaphragm may apply an inward traction on the 
chest, resulting in paradoxic movement of the chest wall dur¬ 
ing inspiration. 31 Therefore, in airway obstruction, the costal 
margin tends to move paradoxically (ie, down and inward). 
Sometimes, the abdomen moves outward while the chest 
moves inward during inspiration. This has also been called 
Hoover sign 32 or paradoxic or seesaw breathing. 

Nasal flaring is enlargement of both openings of the nose 
during inspiration. It is due to constriction of anterior and 
posterior dilators naris muscles. Grunting is a repetitive short 
upper respiratory tract sound produced by partial vocal cord 
closure during expiration. 33 Grunting slows expiratory flow 
and increases lung volume and alveolar pressures. 34 It can be a 
sign of severe disease and suggests impending respiratory fail¬ 
ure. Examiners should be aware that the presence of signs of 
increased work of breathing may change with the state of the 
child. For example, chest wall indrawing may be present only 
when the child is awake or more active. 

Adventitious sounds that can be appreciated on ausculta¬ 
tion include discontinuous or popping sounds, sounds that 
occur throughout the inspiratory or expiratory phase, or 
continuous sounds. 35 Discontinuous sounds have been called 
crackles, rales, or crepitations. They typically occur at the end 
of inspiration. Continuous sounds include wheezes and 
rhonchi and can be musical, high or low pitched, inspiratory 


ET 



Figure 41-2 Chest Indrawing 

Inward movement of the lower chest wall (ie, ribs) when the child breathes 
in. Chest indrawing is best appreciated by viewing the chest laterally and 
looking for indrawing of the ribs or lower sternum with inspiration. Repro¬ 
duced with permission from the World Health Organization. 5 


or expiratory, short or long, or monophonic or polyphonic. 18 
Clinicians should try to distinguish whether sounds are con¬ 
tinuous or discontinuous before applying a name. Many cli¬ 
nicians differentiate continuous sounds that are whistling or 
high pitched (usually called wheezes) from low-pitched, 
snoring, or rattling sounds (usually called rhonchi). Many 
experts consider wheezes to reflect small airway obstruction 
(ie, bronchioles), whereas rhonchi reflect obstruction of the 
large airways (ie, bronchi). 

The language used to describe auscultatory findings can be 
a source of confusion. For example, rhonchi and rales are, 
respectively, the Latin and French words for crackles. Indeed, 
Laennec (the inventor of the stethoscope) distinguished 6 
types of crackles. 36 He believed that only 1 of these was asso¬ 
ciated with pneumonia. 

Auscultation of the chest is often more difficult in infants 
when they are crying. For this reason, it should be performed 
after the visual inspection of the child. It is important to listen 
to the front, back, and sides of the infant’s chest because 
adventitious sounds may only be heard in one location. Even 
when the infant is crying, adventitious sounds may be heard at 
the end of inspiration when the infant is quiet and about to 
take a breath. Examiners should also be aware that wheezes 
can often be appreciated by listening to the sounds of breaths 
from infants’ mouths (audible wheezing). Finally, infants may 
have several types of adventitious sounds present (although 
this is more common in reactive airway disease or viral LRI). 
Textbooks do not recommend percussion of the chest in 
infants because it is difficult to get infants to cooperate with 
this maneuver. 

Are These Symptoms or Signs Ever Normal? 

Premature infants and neonates may appear to have chest 
indrawing during normal breathing or exertion. Grunting and 
groaning noises occur from time to time in normal healthy 
infants. An infant who is playful may demonstrate increased 
respiratory rate, intercostal retractions, and increased work of 
breathing. 

RESULTS 

The Precision of Symptoms and Signs 

A total of 56 patients with lower respiratory tract symptoms 
were examined by pairs of general pediatricians from a group 
that included academic pediatric generalists, pediatric resi¬ 
dents, and pediatricians in community practice. 37 Agreement 
was good for most signs on physical examination that could 
be observed by inspection, including the social interaction 
markers of attentiveness (k, 0.49), smiling (k, 0.51), quality 
of cry (k, 0.63), physical appearance and movement (k, 
0.54), color (k, 0.66), respiratory effort retractions (k, 0.48), 
and use of accessory muscles (k, 0.59). There was only fair 
agreement about most auscultatory findings: prolonged 
expiratory phase (k, 0.22), adventitious sounds (k, 0.3), and 
inspiratory wheezing (k, 0.29). Agreement was good for audi¬ 
ble wheezing (k, 0.7) and for expiratory wheezing (k, 0.63). In 
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general, physicians agreed more often that a finding was 
present than when it was absent. A second study indicated 
that observers are less likely to agree about the severity of 
findings than about their presence or absence. 38 

Several studies of the precision of the respiratory rate 
suggest that respiratory rates counted over 30 seconds aver¬ 
age 2/min to 4/min faster than respiratory rates counted 
during 60 seconds. 29 Counting the respiratory rate over 30 
seconds will lead to more abnormal rates and may spuri¬ 
ously increase the number of children diagnosed as having 
pneumonia. More accurate results are obtained if the aver¬ 
age of two 30-second counts is taken or one 60-second 
count is taken. 

Observer agreement is good for most signs on the physical 
examination. There is better agreement about signs that can 
be observed than signs that require auscultation of the chest. 

The Accuracy of Signs of Pneumonia 

The reported accuracy of clinical findings varies considerably 
among studies because of methodologic limitations and differ¬ 
ences in the spectrum of illness severity among sites in which 
the studies were conducted. In most reports, chest radiographs 
were used as the gold standard and children who had clinical 
findings suggestive of pneumonia were more likely to have had 
a radiographic examination than those who did not (Table 
41-2). Although this approach makes sense clinically, it intro¬ 
duces verification bias that tends to overestimate a test’s sensi¬ 
tivity and underestimate its specificity. 47 


Two studies, both of which were conducted in developing 
countries, attempted to overcome the problem of selective 
ordering of the gold standard by obtaining chest radiographs on 
all children with abnormal clinical findings (eg, elevated respira¬ 
tory rate), as well as a sample of children without abnormal 
findings. 39 ' 41 The reported accuracy was then adjusted statisti¬ 
cally for the fraction of patients sampled in each group. These 2 
studies found that there was no single sign that could be used to 
rule in or rule out pneumonia definitively. In these studies, chil¬ 
dren with elevated respiratory rates were about twice as likely to 
have pneumonia (positive likelihood ratio [LR+], 1.5-2.1) as 
children without elevated respiratory rates (Table 41-3). Con¬ 
versely, those without elevated respiratory rates were only 
about 0.36 to 0.5 times as likely to have pneumonia. These 
studies also found that the presence of chest indrawing 
(retractions) increased the likelihood of pneumonia (LR+, 
2.4-2.5). However, normal chest movements did not rule 
out pneumonia (negative likelihood ratio [LR—], 0.7-0.78). 
Other useful findings that increased the likelihood of pneu¬ 
monia included nasal flaring (LR+, 3.0) and crepitations 
(LR+, 3.5). The absence of nasal flaring (LR, 0.71) and 
crepitations (LR, 0.69) did not effectively lower the likeli¬ 
hood of pneumonia. Other studies in developing countries, 
even though less methodologically sound, found the accu¬ 
racy of clinical signs to be more or less in the same range as 
that found in the 2 more well-designed investigations 
(Table 41-3). 43,48 ' 50 

The lower prevalence of bacterial disease and severe pneu¬ 
monia found in developed countries 51,52 might suggest that 


Table 41-2 Characteristics of Studies Included in Systematic Review 

Source, y Country, Setting Quality Level 2 Inclusion and Exclusion Criteria 

Age Range 

Pneumonia 
Prevalence, % 
(No./Total) 

Definition of Pneumonia 

Redd et al, 39,40 1994 

Lesotho outpatient 
department 

1 

Children with cough, upper respira¬ 
tory tract infection, trouble breath¬ 
ing, and ear pain 

3 mo to 5 y 

17(65/382) 

Parenchymal infiltrate on 
chest radiograph 

Harari et al, 41 1991 

New Guinea outpa¬ 
tient department 

1 

Cough exclusion: wheeze, stridor, 
measles, and pertussis 

8 wk to 6 y 

30 (56/185) 

Radiographic pneumonia 

Crain et al, 42 1991 

US emergency 
department 

1 

Infants with temperature > 38°C 

1 d to 2 mo 

12(27/228) 

Positive chest radiographic 
examination result 

Lozano et al, 13 1994 

Columbia emer¬ 
gency department 

4 

Respiratory signs and symptoms, 
cough < 7 d 

1 wk to 3 y 

65 (130/200) 

Radiographic pneumonia 

Leventhal, 44 1982 

US emergency 
department 

4 

Children with fever or respiratory 
symptoms for whom chest radio¬ 
graph was ordered 

Excluded major chronic disease, 
asthma, croup, and trauma 

3 mo to 15 y 

19(26/136) 

Abnormal chest radio- 
graphic examination result 

Taylor et al, 45 1995 

US emergency 
department 

4 

Temperature > 38°C 

Excluded infants with chronic lung 
disease, bronchopulmonary dyspla¬ 
sia, wheezing, and stridor 

1 d to 2 y 

7.3 (42/572) 

Positive chest radiographic 
examination result 

Zukin et al, 46 1986 

US emergency 
department 

4 

Children with chest radiographic 
examination as part of emergency 
department evaluations 

1 d to 17 y 

14(18/125) 

Radiographic pneumonia 


“See Table 1 -7 for a summary of Evidence Grades and Levels. 
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Table 41-3 Operating Characteristics of Selected Clinical Findings 



Source, y 

Item 

LR+ (95% Cl)“ 

LR- (95% Cl)“ 

Description of Breathing (Time or Explanation) 

Redd et al, 39,40 1994 

Respiratory rate > 50/min (3-11 mo) 

1-9 (■■■) 

0.36 (...) 

Harari et al, 41 1991 

Respiratory rate > 50/min 

2.2 (...) 

0.52 (...) 

Crain et al, 42 1991 

Respiratory rate > 60/min (<8 wk) 

8.0(5.3-12) 

0.55 (0.4-0.8) 

Lozano et al, 43 1994 

Respiratory rate > 50/min (0-11 mo) 

17(1.2-2.3) 

0.52 (0.4-07) 

Leventhal, 44 1992 

Tachypnea (clinician judgment of fast breathing) 

2.0 (1.5-2.7) 

0.32 (0.1-07) 

Taylor et al, 45 1995 

Tachypnea (maximal sensitivity and specificity in different age strata) 

3.2 (2.5-4.1) 

0.34 (0.2-0.6) 

Zukin et al, 46 1986 

Tachypnea (>standard deviation for age) 

1.6 (0.9-2.6) 

0.75 (0.5-1.2) 

Work of Breathing 

Redd et al, 39,40 1994 

Chest indrawing 

2-4 (...) 

070 (...) 

Harari et al, 41 1991 

Chest indrawing 

2.5 (...) 

0.78 (...) 

Lozano et al, 43 1994 

Chest indrawing 

1.3 (1.0-1.5) 

0.53 (0.3-0.9) 

Crain et al, 42 1991 

Chest indrawing 

26(5.7-119) 

0.75 (0.6-0.9) 

Redd et al, 39,40 1994 

Nasal flaring (3-11 mo) 

6.6 (...) 

0.71 (...) 

Lozano et al, 43 1994 

Nasal flaring 

1.2 (0.9-1.6) 

0.83 (0.6-1.1) 

Leventhal, 44 1992 

Nasal flaring 

1.9(1.0-3.8) 

0.79 (0.6-1.1) 

Lozano et al, 43 1994 

Grunting 

1.2 (0.8-1.8) 

0.89 (07-1.1) 

Leventhal, 44 1992 

Grunting 

3.2 (1.1-9.2) 

0.86 (07-1.0) 

Temperature 

Harari et al 41 1991 

>38°C 

1-1 (-■-) 

0.95 (...) 

Zukin et al, 46 1986 

Fever 6 

1.5 (1.3-1.7) 

0.17(0.02-1.1) 

Auscultation 

Leventhal, 44 1992 

Crepitations 

2.1 (1.2-3.8) 

0.73 (0.5-1.0) 

Lozano et al, 43 1994 

Crepitations 

1.8(1.4-2.3) 

0.36 (0.2-0.5) 

Crain et al, 42 1991 

Crepitations 

15(2.9-78) 

0.86 (07-1.0) 

Zukin et al, 46 1986 

Crepitations 

2.9(1.4-37) 

0.57 (0.3-0.97) 

Lozano et al, 43 1994 

Wheezes 

0.63(0.4-1.1) 

1.12(1.0-1.3) 

Crain et al, 42 1991 

Wheezes 

4.0 (0.4-37) 

0.97 (0.9-1.1) 

Zukin et al, 46 1986 

Wheezes 

0.19(0.03-1.3) 

1.30 (1.2-1.5) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 
“Ellipses indicate CIs could not be calculated because insufficient information was reported. 
“Fever defined as 2 SD for age. 


the accuracy of physical examination signs would be lower 
than that reported in studies from developing countries. 
However, the few studies performed in developed countries 
reported results similar to those cited above. 42,44,45 These stud¬ 
ies may have overestimated the accuracy of clinical findings 
because chest radiographs were more likely to be obtained in 
patients with signs and symptoms of disease. In a study by 
Leventhal, 44 the absence of tachypnea, as observed by the cli¬ 
nician examining the patient, was useful for ruling out pneu¬ 
monia (LR-, 0.32), whereas the presence of tachypnea 
somewhat increased the odds of pneumonia (LR+, 2.0). 
Grunting and crepitations were more useful in ruling in dis¬ 
ease (LR+, 3.2 and 2.1, respectively). Their absence did not 
appreciably decrease the likelihood of disease (LR-, 0.86 and 
0.73). The study by Taylor et al 45 reported a somewhat higher 
LR+ for tachypnea (LR+, 3.2), but this study included only 


febrile children, and chest radiographs were not obtained for 
all study patients. 

A study by Crain et al 42 included only infants with fever 
and younger than 8 weeks who were treated in an emer¬ 
gency department. The authors reported that tachypnea 
(LR+, 8.0; 95% Cl, 5.3-12) and chest indrawing (LR+, 26; 
95% Cl, 2.7-119) substantially increased the likelihood of 
pneumonia. Although these likelihood ratios are high, the 
number of patients with pneumonia in this study was 
small and the reported estimates are imprecise (as indi¬ 
cated by the wide 95% CIs). In addition, the high likeli¬ 
hood ratios also reflect the high specificity of tachypnea 
and indrawing in a particular group of patients (early 
infants). The value of the clinical examination may differ 
in this group of children. As in other studies, the absence 
of these findings did not dramatically decrease the likeli- 
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hood of disease for tachypnea (LR-, 0.55) or for indraw¬ 
ing (LR-, 0.75). 

Accuracy of Combinations of Findings 

Clinicians typically evaluate the presence of many findings 
simultaneously to rule in or rule out pneumonia. Despite the 
large number of studies, few have examined the value of clini¬ 
cal findings when they are used together. Two studies assessed 
the value of combinations of clinical findings. Leventhal 44 
found that the absence of pulmonary findings defined as respi¬ 
ratory distress (nasal flaring, grunting, retractions), tachypnea, 
rales, or decreased breath sounds ruled out pneumonia (LR-, 
0.0; 95% Cl, 0.0-0.4). When present, these findings raised the 
likelihood of pneumonia to 1.6 (95% Cl, 1.3-31). In this study, 
information about the presence or absence of respiratory 
symptoms was used in the decision to obtain the gold standard 
examination (a chest radiographic examination). Thus, the 
reported data are likely to overestimate the diagnostic accuracy 
of these combinations of findings so that the true LR- is not as 
good as reported and the LR+ is better than reported. 

In a study of children younger than 2 months, Crain et al 42 
found that the absence of any respiratory findings (rhinorrhea, 
cough, adventitious sounds, or retractions) decreased substan¬ 
tially the likelihood of a positive chest radiographic finding 
(LR-, 0.10; 95% Cl, 0.03-0.4). The presence of any of these 
findings increased the likelihood of pneumonia to 3.4 (95% 
Cl, 2.6-4.3). Because this study included only infants younger 
than 8 weeks, it is not clear how well the results apply to older 
age groups. Crain et al 42 also found that as the number of posi¬ 
tive respiratory findings increased, so did the probability of an 
abnormal chest radiographic finding. 

To summarize, physical examination findings can help pri¬ 
mary care physicians be more certain that an infant does or 
does not have pneumonia. In developed countries, where the 
prevalence of bacterial pneumonia is lower, pneumonia is 
unlikely if all signs are negative. The presence of a positive 
sign will be more useful in increasing clinicians’ certainty 
that an infant has pneumonia in developing countries com¬ 
pared with developed countries because the prevalence of 
bacterial pneumonia is higher. In developed countries, clini¬ 
cians will be more certain if multiple findings are positive. 
Further studies are needed to examine the diagnostic accu¬ 
racy of the chest radiographic examination, the value of cer¬ 
tain signs (such as fever and toxic appearance), and how to 
best take advantage of combinations of clinical findings. 

THE BOTTOM LINE 

First, the initial observation of the infant may be the most 
critical component of the examination. Observation is 
important before interacting with a child. 

Second, because of its moment-to-moment variability, the 
respiratory rate should be counted by observing the chest 
while the child is quiet during two 30-second intervals or 
during a full minute. Clinicians need to be especially aware of 
the variability of the examination as the child’s level of activ¬ 
ity changes. 


Third, auscultation is relatively unreliable for examination 
of infants. Clinicians need better training and better termi¬ 
nology to describe abnormal chest sounds. The overall clini¬ 
cal appearance may be accurate but the delineation of its 
value needs more study. 

Fourth, the best individual finding for ruling out pneu¬ 
monia is the absence of tachypnea. Chest indrawing and 
other signs of increased work of breathing (eg, nasal flar¬ 
ing) and abnormal auscultatory findings are better for rul¬ 
ing in pneumonia. In developed countries, multiple findings 
must be present for more certainty about the presence of 
pneumonia. 

Fifth, if all clinical signs (respiratory rate, auscultation, and 
work of breathing) are negative, the chest radiographic find¬ 
ing is unlikely to be positive. 
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CLINICAL SCENARIO 


A 15-month-old child is brought to your office in May. 
She had been “breathing heavy” the previous day. She was 
well until about 2 days ago, when she developed nasal con¬ 
gestion with clear rhinorrhea, cough, and a low-grade 
fever. Your review shows this child had a normal birth his¬ 
tory, demonstrated normal growth and development, and 
has not had any significant respiratory infections or reac¬ 
tive airway disease. On examination, you find a tempera¬ 
ture of 38.2°C and a respiratory rate of 45/min. She has 
clear rhinorrhea and mild subcostal retractions but no 
abnormal lung sounds on auscultation. 

UPDATED SUMMARY ON PEDIATRIC PNEUMONIA 

Original Review 

Margolis P, Gadomski A. Does this infant have pneumonia? 
JAMA. 1998;279(4):308-313. 

UPDATED LITERATURE SEARCH 

A MEDLINE search was conducted from 1996 to 2005 to 
identify English-language articles about pneumonia in 
infants or children, using the search strategy techniques of 
The Rational Clinical Examination series. The search yielded 
49 articles. Additionally, Scientific Citation Index was used to 
identify articles that cited the original publication in The 
Rational Clinical Examination series, yielding 18 additional 
articles. The abstracts of these 67 articles were reviewed and 
all case-control, cohort, or randomized trials that addressed 
clinical signs and symptoms of pneumonia were selected for 
further inspection. The references for these articles were also 
reviewed to find any other relevant articles. The focus of the 
original publication was on identifying symptoms and signs 
that help distinguish pneumonia from other types of pediat¬ 
ric lower respiratory tract illnesses. In this update, we shifted 
the focus slightly and attempted to discover the findings that 
help identify the pediatric patient who will have an abnormal 
chest radiograph result. In total, 5 articles were selected for 
inclusion, although we subsequently excluded one article that 


had confusing likelihood ratio (LR) results, and we were 
unable to contact the author for verification. 1 


NEW FINDINGS 

• Diminished breath sounds show substantial interrater reli¬ 
ability (k, 0.73). 1 

• Pulse oximetry with values less than 98% has a sensitivity of 
only 55% for pneumonia and has no independent utility after 
consideration of the auscultatory findings and respiratory rate. 

• The LR for pneumonia is 3.4 when the onset of a respiratory 
illness was equal to or greater than 6 days. 

Details of the Update 

Since the publication of the original review, 4 additional studies 
evaluated different clinical findings for predicting radiographic 
changes suggestive of pneumonia in pediatric patients. Overall, 
there remains a paucity of data that examine combinations of 
clinical signs. Additionally, there remains difficulty in combining 
data from multiple studies because of differences in the defini¬ 
tions of certain clinical findings such as tachypnea and respira¬ 
tory distress. Finally, the broad age range of patients included in 
the studies makes generalization of findings to infants more dif¬ 
ficult. For example, grunting and nasal flaring would not be typ¬ 
ical findings in older pediatric patients with pneumonia. The 
studies included in this update used age-based criteria for the 
finding of tachypnea. 

A prospective study of children presenting to an emergency 
department with any type of acute respiratory illness provides 
useful information that allows comparison between signs and 
the overall clinical judgment. 2 As a single finding, tachypnea had 
the best diagnostic odds ratio (DOR; 5.8) that came from its 
positive LR of 2.2 (95% confidence interval [Cl], 1.5-3.2) and 
negative LR of 0.39 (95% Cl, 0.22-0.70). The additional infor¬ 
mation from chest indrawing and alveolar rales did not clinically 
improve the diagnostic odds or LRs. Clinical judgment that fac¬ 
tors in all items from the medical history and physical examina¬ 
tion (DOR, 3.6; 95% Cl, 1.5-8.7) had results that were slightly 
less efficient than the single finding of tachypnea. Clinicians 
should recall the age-based World Health Organization (WHO) 
definitions of tachypnea for infants (Table 41-4). 
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Table 41 -4 World Health Organization Age-based Criteria for Tachypnea 

Age, mo 

Tachypnea, Breaths/min 

<2 

>60 

2-12 

>50 

>12 

>40 


A case-controlled study from retrospectively collected data 
suggested that pulse oximetry at a threshold of 98% has no 
value for diagnosing pneumonia. 3 Although clinical examina¬ 
tion data reported from case-controlled studies typically pro¬ 
vide a low level of evidence, the findings here supported the 
usefulness of tachypnea. Pulse oximetry added no significance 
to a model containing the respiratory rate and auscultatory 
findings. Unfortunately, the model itself was not particularly 
powerful for predicting pneumonia. ( R 2 = 0.072 is a measure 
of how well the model predicts the outcome. The value means 
that the model explains only 7.2% of the variance, a statisti¬ 
cally significant result, although one that will lead to incorrect 
diagnoses for many patients.) 

The patient selection criteria affects the interpretation of 
the results. A study that included wheezing children (younger 
than 18 months) first determined the factors associated with 
the clinician ordering a chest radiograph. 4 The presence of 
any typical clinical sign for pneumonia was associated with a 
request for chest radiograph. When confined to wheezing 
young children, the presence of grunting worked better than 
tachypnea with a respiratory rate of greater than 60/min (a 
rapid rate in comparison with the WHO standards noted 
above). The presence of grunting had an LR of 2.7 for pneu¬ 
monia (95% Cl, 1.6-4.4). However, when combined with a 
low oxygen saturation of less than 93% (much lower than the 
threshold in the case-control study), the combination of 
grunting and a low oxygen saturation in a wheezing young 
child had an LR of 4.0 (95% Cl, 1.3-12). Unfortunately, the 
absence of both these findings had little effect on ruling out 
pneumonia (LR, 0.90 when both signs were normal; 95% Cl, 
0.81-1.0). 

A prospective study provides some insight into the probability 
of pneumonia once the physician requests chest radiography 
(prevalence, 36%). 5 The prevalence among all children with res¬ 


piratory symptoms would be lower. Tachypnea, using the WHO 
criteria for respiratory rate, was similar in utility to that of previ¬ 
ous studies (positive likelihood ratio [LR+], 2.8; 95% Cl, 1.6- 
5.0; negative likelihood ratio [LR-], 0.91; 95% Cl, 0.86-0.97). 

Another prospective cohort study 6 found that tachypnea 
had an LR+ of 1.4 (95% Cl, 1.0-1.9) and an LR- of 0.67 (95% 
Cl, 0.44-1.0). However, the age-adjusted definitions for tach¬ 
ypnea required a much higher respiratory rate than the WHO 
criteria. These poor results for tachypnea allow the inference 
that the WHO criteria are necessary to optimize the clinical 
utility of the finding. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

New data allowed for comparison with the data from the 
original review. For the most part, the new data confirmed 
the findings of the original review. Among infants and young 
children with respiratory symptoms or signs, a broad range 
of prevalence should be considered (15%-35%) that may 
show seasonal and geographic variation. The data confirm 
that although tachypnea may be the most predictive in ruling 
in or ruling out pneumonia, no clinical examination finding 
alone is sufficiently powerful to predict the presence or 
absence of pneumonia. 

CHANGES IN THE REFERENCE STANDARD 

The reference standard for the diagnosis of pediatric pneu¬ 
monia remains the chest radiograph. 

RESULTS OF LITERATURE REVIEW 

Selected Univariate Findings 
for Pediatric Pneumonia 

Clinical judgment, a measure that allows the clinician to con¬ 
sider all findings, may not work better than individual find¬ 
ings ( ). When the clinician suspects pneumonia, 

the LR is 1.7 to 2.5; when the clinician suspects the child has 
no pneumonia, the LR is 0.29 to 0.46. 2,5 


Table 41-5 Likelihood Ratios of Univariate Findings for Pediatric Pneumonia 




Source 

Finding 

LR+ (95% Cl) 

LR- (95% Cl) 


Lynch et al 5 

Tachypnea (WHO criteria) 

2.8 (1.6-5.0) 

0.91 (0.86-0.97) 


Palafox et al 2 

Tachypnea (WHO criteria) 

2.2 (1.5-3.2) 

0.39 (0.22-0.70) 


Mahabee-Gittens et al 4 

Grunting and pulse oximetry < 93% 

4.0(1.3-12) 

0.90 (0.81-1.0) 


Mahabee-Gittens et al 4 

Grunting among children wheezing, < 18 mo 

2.7 (1.6-4.4) 

0.7 (0.55-0.89) 


Lynch et al 5 

Retractions 

2.7(14-6.9) 

0.97(0.93-1.0) 


Palafox et al 2 

Chest indrawing (retractions) 

1.7 (1.2-2.4) 

0.54 (0.32-0.91) 


Palafox et al 2 

Clinical judgment 

1.7(1.2-2.3) 

0.46 (0.25-0.84) 


Lynch et al 5 

Fever 

1.2(14-1.3) 

0.30 (0.18-0.49) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; WHO, World Health Organization. 
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Multivariate Findings for Pediatric Pneumonia 

In a study by Lynch et al, 5 a multivariate model was assessed 
for diagnosing pediatric pneumonia. The study evaluated 
combinations of findings and created a pneumonia score that 
also supported a role for assessing tachypnea: 

Pneumonia score = -4.71 + 1.10 x (tachypnea) + 0.74 x 
(crackle) + 0.42 x (decreased breath sound) + 1.15 x 
(measured fever) 

Probability of pneumonia = exp score /(1 + exp score ) 

(The presence of a finding is coded as 1, whereas the 
absence of a finding is coded as 0. The presence of 
tachypnea is based on age-adjusted rates.) 

The most useful finding from this model is that the 
absence of all 4 findings leads to a less than 1% probability of 
pneumonia. The presence of all 4 findings creates a probabil¬ 
ity of 21%, which suggests the need for a chest radiograph 
but does not establish a clinical diagnosis with a high degree 
of confidence. The area under the receiver operating charac¬ 
teristic curve was only 0.67 (a measure of accuracy), high¬ 
lighting the finding that even combinations of signs lack a 
high level of efficiency for diagnosing pneumonia. 

The model may be best for identifying signs that physi¬ 
cians might consider as part of their clinical judgment. How¬ 
ever, clinicians should recognize that their overall clinical 
judgment and the results from a more structured approach 
in the form of a logistic model lack accuracy. 

EVIDENCE FROM GUIDELINES 

Jadavji et al 7 published guidelines in 1997 for the diagnosis 
and management of pediatric pneumonia. They conducted a 
systematic review on the etiology, diagnosis, and manage¬ 
ment of pediatric pneumonia. The evidence from this review 
includes the studies that were reviewed in the original Ratio¬ 
nal Clinical Examination article, with 2 exceptions. One 
study focused on infants younger than 4 months and there¬ 
fore not as easily generalized to the overall pediatric popula¬ 
tion. 8 Overall, the data from this guideline are consistent with 
the findings of the original Rational Clinical Examination 
article. 


CLINICAL SCENARIO—RESOLUTION 


This infant may have pneumonia. According to WHO crite¬ 
ria, she has tachypnea, although she is febrile, which could 
explain her mildly increased respiratory rate. Tachypnea 
should raise your suspicion for pneumonia, with its best LR+ 
of about 2.8. Although she has mild retractions that would 
seem to further increase the likelihood of pneumonia, the 
additional information provided by this sign is less accurate 
than the information from tachypnea alone. The clinical his¬ 
tory and time of year would make you less suspicious of 
other entities such as asthma or infection from respiratory 
syncytial virus (RSV). From the original Rational Clinical 
Examination article and this Update, you estimate a preva¬ 
lence range of 15% to 35% at the lower end of this range. The 
posttest probability of pneumonia with an LR of 2.8 for tach¬ 
ypnea is 33%. From the multivariate model, she has tachyp¬ 
nea and fever, making the probability of pneumonia 7.9%. In 
this infant, it would be reasonable to check a chest radio¬ 
graph to confirm or exclude pneumonia. 
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PNEUMONIA, INFANT AND CHILD— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Given cough or respiratory symptoms, the prevalence of 
pneumonia is approximately 15% to 35%. However, prev¬ 
alence of pneumonia may be lower during RSV season. 
Prevalence may also be slightly higher in children younger 
than 3 years. 

POPULATION FOR WHOM PEDIATRIC PNEUMONIA 
SHOULD BE CONSIDERED 

Patients with symptoms of acute respiratory illness, pri¬ 
marily cough, respiratory distress, or tachypnea, need to 
have pneumonia considered as part of the differential 
diagnosis. 

DETECTING THE LIKELIHOOD OF 
PEDIATRIC PNEUMONIA 

The individual clinical symptoms used to identify patients 
with pneumonia have relatively poor predictive value. 
Tachypnea, respiratory distress, and abnormal lung 
sounds (rales) have the best operating characteristics, 
although the data from different sources conflict on their 
significance (Table 41-6). Additionally, the clinician’s 
overall clinical judgment/impression may have operating 
characteristics similar to individual signs and symptoms 
in diagnosing pneumonia, but the overall judgment is 
admittedly a complex and difficult “finding” to quantify. 
To date, there are no randomized controlled studies to 
validate any proposed multivariate model for predicting 
pneumonia. 


Table 41-6 Likelihood Ratios of Symptoms and Signs for 

Pediatric Pneumonia 

Symptom or Sign 

LR+ (95% Cl) 
or Range 

LR- (95% Cl) 
or Range 

Grunting among children with 
wheezing, < 18 mo 

2.8 (1.6-4.4) 

0.7 (0.55-0.89) 

Grunting 

2.8-3.2 

0.70-0.86 

Retractions 

2.7 (1.1-6.9) 

0.97 (0.93-1.0) 

Rales 

1.8-15 

0.69-0.86 

Tachypnea (use WHO age- 
adjusted criteria) 

1.6-8.0 

0.32-0.91 

Fever 

1.2-1.5 

0.17-0.30 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio; WHO, World Health Organization. 

REFERENCE STANDARD TESTS 

The reference standard for pediatric pneumonia remains the 
chest radiograph. Sputum production is not a frequent finding 
in pediatric patients, and therefore, isolation of sputum for 
microbiologic correlation with pneumonia remains both diffi¬ 
cult and impractical. The development of rapid antigen detec¬ 
tion of common viruses such as RSV and influenza will help 
the clinician rule out causes of respiratory symptoms other 
than bacterial pneumonia. As of now, there is still no way to 
differentiate bacterial vs viral pneumonia by chest radiograph. 






















EVIDENCE TO SUPPORT THE UPDATE: 

Pneumonia, Infant and Child 



TITLE Clinical, Laboratory, and Radiological Informa¬ 
tion in the Diagnosis of Pneumonia in Children. 

AUTHORS Grossman L, Caplan S. 

CITATION Ann EmergMed. 1988;17:( 1)43-46. 

QUESTION In pediatric patients with suspected pneu¬ 
monia who undergo chest radiograph, are there signs or 
symptoms that predict radiographic pneumonia? 

DESIGN This is a prospective nonconsecutive cohort 
study of 155 patients during 7 months. 

SETTING Two pediatric emergency departments. 

PATIENTS Pediatric patients younger than 19 years in 
whom pneumonia was considered and a chest radiograph 
was ordered. None of the patients had a history of pneu¬ 
monia, chronic lung disease, chronic heart disease, or 
immunodeficiency. Sixty-two percent of the study patients 
were younger than 2 years. Eleven potential subjects were 
not enrolled because a decision to treat was made without 
a chest radiograph being performed. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The presenting signs and symptoms of patients before chest 
radiography were systematically recorded. Clinicians (pedia¬ 
tricians, pediatric nurse practitioners, and medical students) 
recorded their overall clinical impression of pneumonia and 
what their treatment plan would be if radiography were not 
available before performance of the chest radiograph. Chest 
radiograph was the reference standard for the diagnosis of 
pneumonia. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity of recorded signs, symptoms, and 
clinical impression for predicting radiographic pneumonia. 
Assessing accuracy of clinical impression in predicting pneu¬ 
monia. Assessing combinations of signs and symptoms in 
predicting pneumonia. 


MAIN RESULTS 

Cough, tachypnea, moderate/severe degree of illness, and 
fever were the only symptoms and signs that were present in 
more than 50% of patients enrolled in the study (66%, 52%, 
62%, and 55%, respectively). 

Clinician accuracy in the diagnosis of pneumonia was 77%, 
and both the positive and negative likelihood ratios (LRs) were 
more promising than the individual findings ( le 41 - 7). 

Despite the results for clinical judgment, regression anal¬ 
ysis did not find any combination of signs or symptoms that 
adequately predicted the presence of pneumonia. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS This was a prospective cohort study. All 
patients enrolled underwent the reference standard of chest 
radiograph. There was quantification of the different signs 
and symptoms that led clinicians to order chest radiographs. 

LIMITATIONS The definition of tachypnea used in this study 
is different from the World Health Organization (WHO) cri¬ 
teria. If WHO criteria had been used, there might have been a 


Table 41-7 Likelihood Ratios of Findings for Pediatric Pneumonia 

Findings 

Sensitivity, 

% 

Specificity, 

% 

LR+ (95% Cl) 

LR- (95% Cl) 

Clinical judgment 

80 

68 

2.5(1.8-3.4) 

0.29(0.17-0.52) 

Rales 

43 

77 

1.9 (1.2-3.0) 

0.74 (0.57-0.96) 

Tachypnea 8 

64 

54 

1.4 (1.0-1.9) 

0.67 (0.44-1.0) 

Decreased 
breath sounds 

23 

84 

1.4 (0.74-2.8) 

0.92(0.77-1.1) 

Degree of illness" 

67 

40 

1.1 (0.87-1.4) 

0.83(0.52-1.3) 

Sudden onset of 
illness" 

17 

84 

1.1 (0.50-2.2) 

0.99(0.85-1.1) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

a >80/min < 1 year; >40/min > 1 year; >30/min > 2 years; >25/min > 5 years; 
>22/min > 10 years; >20/min > 15 years. 1 (This differs from the World Health Orga¬ 
nization definition for tachypnea.) 

"Degree of illness not further clarified in article. 

'Less than 12 hours of symptoms before presentation. 
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TITLE Can We Predict Which Children With Clinically 
Suspected Pneumonia Will Have the Presence of Focal 
Infiltrates on Chest Radiographs? 

AUTHORS Lynch T, Platt R, Gouin M, Larson C, Pat- 
enaude Y. 

CITATION Pediatrics. 2004;113(3 pt 1):186-189. 

QUESTION In patients presenting to the emergency 
department with “clinically suspected pneumonia,” what 
clinical factors predict an infiltrate on radiograph? 

DESIGN Prospective nonconsecutive cohort study of 
570 patients. 

SETTING Tertiary -referral-center pediatric emergency 
department. 

PATIENTS Children aged 1 to 16 years who were sus¬ 
pected by the pediatric emergency physician to have 
pneumonia and who were receiving a chest radiograph. 
Children with chronic respiratory disease, congenital or 
complex heart disease, gastroesophageal reflux, sickle cell 
anemia, malignancy, spastic quadriplegia, acute asthma 
exacerbation, or recent pneumonia treatment with antibi¬ 
otics were excluded. 


much higher number of patients who would have been clas¬ 
sified as having tachypnea, which might have increased the 
positive likelihood ratio (LR+) of tachypnea. 

There was no explanation of what “degree of illness” 
means, and therefore, it has limited clinical utility. 

There was no mention of blinding of radiologists to the 
clinical presentation of study patients. There was no descrip¬ 
tion of what qualified a radiograph as being diagnostic of 
pneumonia. 

The results provided did not include 95% confidence 
interval. Additionally, it is unclear how many “observations” 
were made for each patient because there were multiple 
examiners for each patient. 

CONCLUSIONS 

The focus of this study was to determine whether there were 
any signs or symptoms that were helpful in diagnosing pneu¬ 
monia in children younger than 18 years and presenting with 
symptoms sufficient to warrant a chest radiograph. Addi¬ 
tionally, it sought to assess how the results of the radiograph 
influenced management decisions by the ordering clinician. 
Finally, it attempted to assess the clinician’s overall impres¬ 
sion as a predictor of pneumonia. It is useful to know that 
cough, tachypnea, “moderate/severe degree of illness,” and 
fever are the most common symptoms and signs for which a 
radiograph is ordered. This may give a hint as to what goes 
into the clinician’s overall clinical impression when he or she 
considers the diagnosis of pneumonia. In this study, the 
overall clinical impression performed better (LR+, 2.5) than 
any individual sign or symptom in diagnosing pneumonia. 
Physician accuracy of diagnosing pneumonia was 77%. 
Obtaining radiographs is useful because they changed man¬ 
agement plans for 22% of study patients. In this study, only 
rales and tachypnea reached statistical significance in pre¬ 
dicting pneumonia. However, both of these signs had only 
marginal diagnostic power, with LR+s of only 1.9 and 1.4, 
respectively. 

Reviewed by Daniel Ostrovsky, MD 

REFERENCE FOR THE EVIDENCE 

1. Johnson TR. Development of the lungs. In: Johnson TR, Moore WM, 
Jeffries JE, eds. Children Are Different. 2nd ed. Columbus, OH: Ross Lab¬ 
oratories; 1978:128-129. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Baseline demographic data and a study questionnaire were 
obtained prospectively. Physicians filled out a clinical ques¬ 
tionnaire about symptoms at evaluation. 

All subjects had a posterior-anterior and lateral chest 
radiograph evaluated by 3 radiologists for the presence or 
absence of infiltrates. 


MAIN OUTCOME MEASURES 

Six symptoms (fever history, cough, coryza, shortness of 
breath, wheezing, and pleurisy), 6 signs (decreased breath 
sounds, crackles, bronchial sounds, wheezing, retractions, 
and grunting), and 3 vital signs (measured temperature for 
fever, age-adjusted tachypnea, and tachycardia) were entered 
into a logistic model to determine their independent signifi¬ 
cance. 1 Previously recommended guidelines were assessed for 
their sensitivity and specificity. The interobserver agreement 
for the chest radiographs was assessed. 


MAIN RESULTS 

Five hundred seventy patients were enrolled, of whom 204 
(36%) had pneumonia. The agreement between 7 radiolo¬ 
gists for the presence of pneumonia was moderate (weighted 
K, 0.57). 
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Only 2 findings, when present, had a likelihood ratio (LR) 
that was greater than 2 and that excluded 1 in its 95% confi¬ 
dence interval (Cl): the presence of tachypnea (8% of chil¬ 
dren) had an LR of 2.8 (95% Cl, 1.6-5.0), whereas retractions 
(3% of children) had an LR of 2.7 (95% Cl, 1.1-6.9). For 
decreasing the likelihood of pneumonia, the absence of a 
fever (LR, 0.30; 95% Cl, 0.18-0.49) or cough (LR, 0.35; 95% 
Cl, 0.54-0.81) was the only finding with an LR less than 0.6 
and that excluded 1.0 from the 95% CI. 

A logistic model ( 3ox 4 ) identified 4 findings that were 

independently useful. However, the model was not highly 
accurate because of its poor specificity (area under the 
receiver operating characteristic curve of 0.67). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Multiple radiologists who were blinded to the 
clinical presentation evaluated the reference standard. There 
was a large sample size. Multiple clinical predictors were 
assessed with regression analysis. 

LIMITATIONS The enrollment process may have caused 
selection bias, but only fever history and cough occurred in 
more than half the patients, which suggests that the remain¬ 
ing findings might have valid results because they were not 
used to preferentially identify children. 

When the likelihood of focal opacities is predicted in chil¬ 
dren clinically suspected of having pneumonia, only 4 signs 
and symptoms are independently statistically significant. The 
model is highly sensitive but poorly specific. To highlight 
this, the probability of pneumonia can be contrasted for the 
child with no tachypnea, crackles, decreased breath sounds, 
or fever (probability, 0.9%) vs a child with all 4 findings 
present (probability, 21%). Thus, a child with no findings has 
a less than 1% chance of having pneumonia. On the other 
hand, even with the presence of all 4 findings, most children 
will not have a radiographic infiltrate. 

Reviewed by Daniel Ostrovsky, MD 

REFERENCES FOR THE EVIDENCE 

1. Chamberlain JM, Patel KM, Ruttimann UE, Pollack MM. Pediatric risk 
of admission (PRISA): a measure of severity of illness for assessing the 
risk of hospitalization from the emergency department. Ann Emerg Med. 
1998;32(2): 161-169. 

2. Leventhal JM. Clinical predictors of pneumonia as a guide to ordering 
chest roentgenograms. Clin Pediatr. 1982;21 (12):730-734. 


Box 41-1 Logistic Model for Calculating the Pneumonia Score 

Pneumonia score = -4.71 + 1.10 x 
(tachypnea) + 0.74 x (crackle) + 0.42 x 
(decreased breath sound) + 1.15 x (measured fever) 

The presence of a finding is coded as 1, whereas the 
absence of a finding is coded as 0. The presence of tachyp¬ 
nea is based on age-adjusted rates. 

Probability of pneumonia = exp score /(1 + exp score ) 

The investigators attempted to validate the guidelines by 
Leventhal 2 (respiratory distress, tachypnea, rales, and 
decreased breath sounds) but found these yielded a sensi¬ 
tivity of only 81% and a specificity of 37%. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients selected for inclusion had baseline information col¬ 
lected prospectively. Physical examination findings were doc¬ 
umented at evaluation. The reference standard was a chest 
radiograph for the presence of focal infiltrates. The reference 
standard was applied at the discretion of the evaluating phy¬ 
sician. A radiologist who was not masked to the clinical pre¬ 
sentation interpreted the radiographs. A report result was 
considered positive if it recorded “focal infiltrate,” “pneumo¬ 
nia,” “consolidation,” or “atelectasis vs infiltrate.” 


TITLE Clinical Factors Associated With Focal Infiltrates 
in Wheezing Infants and Toddlers. 

AUTHORS Mahabee-Gittens EM, Dowd MD, Beck JA, 
Smith SZ. 

CITATION Clin Pediatr. 2000;39(7):387-393. 

QUESTION In wheezing infants presenting to an emer¬ 
gency department, are there clinical factors that can pre¬ 
dict focal infiltrates on chest radiograph? 

DESIGN Prospective cohort of infants up to 18 months 
of age. 

SETTING The study took place during October and 
April at the Children’s Hospital Medical Center in Cincin¬ 
nati, Ohio, a tertiary-care hospital pediatric emergency 
department. 

PATIENTS Infants aged 18 months or younger and pre¬ 
senting to the emergency department. Inclusion was a 
convenience sample of patients with documented wheez¬ 
ing on physical examination by a physician. 
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MAIN OUTCOME MEASURES 

The authors collected data on all potential eligible patients 
and compared the odds ratios for physical examination signs 
for individuals selected for chest radiographs vs those who 
did not undergo radiography. These odds ratios describe the 
factors associated with requesting a radiograph. 

Sensitivity, specificity, and odds ratios of the clinical find¬ 
ings for diagnosing pneumonia were calculated for children 
who underwent radiography. Interobserver variability was 
assessed in 12% of the children. 


MAIN RESULTS 

Among 471 children who made a visit to the emergency 
department with wheezing and were potentially eligible, 212 
had chest radiographs. Twenty-three percent (49/212) had a 
focal infiltrate. Except for localized wheezing, each sign in 
8 was more likely present in a child receiving a chest 
radiograph than one who did not (odds ratio with lower 95% 
confidence interval > 1.0). 

In patients who did not undergo chest radiograph, follow¬ 
up telephone calls and searches of admission databases were 
made 48 hours after presentation to look for patients who 
may have incorrectly been classified as not having pneumo¬ 
nia. Only 3 patients who did not undergo chest radiography 


Table 41-8 Likelihood Ratios of Findings for Pediatric Pneumonia 

Sign (No. With the 
Finding) 

LR+ (95% Cl) 

LR- (95% Cl) 

DOR (95% Cl) 

Vital Signs 

Temperature > 100.4°F 
(38.0°C) (115) 

1.12(0.95-1.6) 

0.76(0.51-1.1) 

1.6 (0.8-3.1) 

Respiratory rate > 60/ 
min (61) 

1.1 (0.67-1.8) 

0.97(0.78-1.2) 

1.1 (0.6-2.2) 

Oxygen saturation < 
93% (41) 

2.0 (1.1-3.4) 

0.82(0.67-1.1) 

2.4 (1.1-5.0) 

Physical Examination 

Nasal flaring (82) 

1.3 (0.90-1.9) 

0.70 (0.55-0.89) 

1.6 (0.8-3.0) 

Grunting (45) 

2.7 (1.6-4.4) 

0.90(0.79-1.0) 

3.8(1.9-7.8) 

Crackles (67) 

1.6 (1.1-2.4) 

0.76(0.58-1.0) 

2.1 (1.1-4.1) 

Decreased breath 
sounds (19) 

2.4 (1.0-5.7) 

0.90(0.79-1.0) 

27(1.0-7.2) 

Localized wheezing (21) 

1.0(0.4-27) 

1.0 (0.90-1.1) 

1.0 (0.4-3.0) 

Retractions (202) 

1.0(0.94-1.1) 

0.83(0.18-3.8) 

1.2 (0.2-5.9) 

l:E >1:2 (166) 

1.2 (1.0-1.3) 

0.43(0.18-1.0) 

2.8(1.07.5) 

Combination of Findings 

Grunting and oxygen satu¬ 
ration < 93% (48) a 

■ 4.0(1.3-12) 

0.90(0.81-1.0) 

4.4 (1.3-15) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; l:E, length of time 
in inspiration in proportion to time in expiration; LR+, positive likelihood ratio; LR-, 
negative likelihood ratio. 

“Variables selected as independently useful in a logistic model. 


were hospitalized within 2 days of presentation. All 3 patients 
had a chest radiograph on representation and none of them 
had an infiltrate. Seventeen other patients who did not ini¬ 
tially undergo chest radiography had a subsequent chest 
radiograph in the following 48 hours. Three of these patients 
had an infiltrate. 

Oxygen saturation, nasal flaring, grunting, crackles, and 
retractions were all reliable and had K > 0.70. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS The study design was prospective. A uniform 
entrance criterion (wheezing) identified potentially eligible 
patients with a narrower age range than was present in some 
of the other studies. The authors compared the signs present 
in children who underwent radiographs vs those who did 
not. 

LIMITATIONS The enrollment process created a conve¬ 
nience sample that leads to selection bias. Radiologists who 
were not masked to the clinical presentation interpreted the 
outcome measure. There was only 1 radiologist per case, 
which could lead to accuracy issues in interpretation. This 
study was done in a population during a period when respi¬ 
ratory syncytial virus bronchiolitis typically has a high 
prevalence. 

Children younger than 18 months who wheezed were the 
focus of this study. A number of clinical signs were more likely 
present in children who were referred for chest radiography. 
However, most of these signs were not particularly useful when 
either present or absent. As a single finding, the presence of 
grunting (present in 60 children overall and in 45 referred for 
radiography [21%]) was the most useful finding, with a likeli¬ 
hood ratio of 2.7. The absence of any of these findings was 
clinically not useful. When combined with low oxygen satura¬ 
tion, a logistic model selected grunting with low oxygen satu¬ 
ration as useful. The likelihood ratio increased to 4.0 for the 
presence of these 2 signs. 

Clinicians should recognize that the prior probability of 
pneumonia has seasonal variation in the pediatric popula¬ 
tion. Bronchiolitis, an illness that may “look like pneumo¬ 
nia,” is more common in the winter and is associated with 
tachypnea and abnormal lung findings. Thus, the complex 
relationship between changing prevalence of disease and sea¬ 
sonal variation in signs affects the interpretation of the pre¬ 
dictive power of these findings. 

Reviewed by Daniel A. Ostrovsky, MD 
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TITLE Diagnostic Value of Tachypnoea in Pneumonia 
Defined Radiologically. 

AUTHORS Palafox M, Guiscafre H, Reyes H, Munoz O, 
Martinez H. 

CITATION Arch Dis Child. 2000;82(l):41-45. 

QUESTION In children presenting with acute respira¬ 
tory infection, what are the sensitivity and specificity of 
tachypnea for diagnosing pneumonia? 

DESIGN This study is a prospective cohort study of chil¬ 
dren presenting to an emergency department with an 
acute respiratory tract infection. All children had chest 
radiography. Baseline characteristics and physical exami¬ 
nation findings were obtained prospectively. 

SETTING A general hospital in Mexico that is a referral 
center for sick children. 

PATIENTS Eligible children were between 3 days and 5 
years of age, required medical care during the 6-month 
study period, had been clinically diagnosed with pneumo¬ 
nia, and had the disease for fewer than 2 weeks. Each child 
in the study had a matched control who was the next child 
treated in the clinical unit and had a diagnosis of an acute 
respiratory infection without pneumonia. Exclusion crite¬ 
ria were children with chronic diseases, genetic abnormal¬ 
ities, neurologic diseases, asthma, or sepsis. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The respiratory rate was measured for 1 minute, with the 
child lying down, not crying, and without fever. Tachypnea 
was defined by age-based World Elealth Organization 
(WHO) criteria (Table 41-9). 

A chest radiograph, evaluated by a single radiologist blinded 
to the clinical diagnosis, served as the reference standard. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity of tachypnea, chest indrawing, alve¬ 
olar rales, and combinations of these findings. Ten radiographs 
were reassessed to determine the intraobserver variation. 


MAIN RESULTS 

Thirty-five children (32%) had pneumonia. There were 7 
signs and symptoms or combinations that had significant 
sensitivity and specificity for predicting pneumonia on chest 
radiograph. Tachypnea had the best sensitivity of the signs 
studied (74%), followed by chest indrawing (71%). Although 
combining signs did slightly improve specificity, it decreased 
sensitivity. Alveolar rales had the best specificity but had 
poor sensitivity ( le 41 10). 


Table 41-9 World Health Organization Age-based Criteria for 
Tachypnea 


Age, mo 

Tachypnea, Breaths/min 

<2 

>60 

2-12 

>50 

>12 

>40 


A discriminant analysis, using all the recorded symptoms 
and signs, was 71% accurate but not appreciably different 
from the accuracy of tachypnea alone (69% accurate). The 
discriminant analysis performed better than clinical judg¬ 
ment (62% accurate). 

If a patient had disease at least 6 days, the likelihood ratio 
was 3.4. A discriminant analysis revealed that duration of 
disease correctly classified 83.3% of patients. 

The K statistic for intraobserver variability of the radiolo¬ 
gist was 0.68. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Whereas other studies included children 
according to whether they had a chest radiograph, this study 
included a broader population of patients for whom the 
diagnosis of pneumonia was a reasonable consideration. The 
“control” patients were patients with some type of respira¬ 
tory illness (cough or rhinorrhea with systemic signs of 
infection). These patients were not really “controls,” but 
rather patients at risk for pneumonia and in whom pneumo¬ 
nia could have been part of the differential diagnosis 
(although at a lower likelihood than the for case patients). All 
included study patients underwent the reference standard. 
The radiologists were masked to the clinical presentation. 
Intraobserver variability of the radiologist reading the radio¬ 
graphs was tested. 

LIMITATIONS Among all children with respiratory illnesses, 
the “case patients” were oversampled, which can lead to an 
overestimation of sensitivity (and underestimation of speci¬ 
ficity). A single radiologist performed the interpretation of 
the radiographs, although there was an attempt to account 
for this by measuring intraobserver variability and masking 
the radiologist to the clinical presentation. 

Of the presenting clinical signs, all except chest indrawing 
(51%) occurred in less than half the patients, which allows us 
to make inferences about the utility of the findings because 
no one finding was required in each patient. It is remarkable 
that the overall clinical judgment had a diagnostic odds ratio 
(a measure of accuracy) that was not quite as good as the sin¬ 
gle finding of tachypnea. Tachypnea, defined by WHO crite¬ 
ria, was the most accurate finding as evidenced by its 
diagnostic odds ratio. 

In subgroup analysis using tachypnea as the clinical sign 
being evaluated, there was no significant difference in the 
sensitivity and specificity generated for children of differing 
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Table 41-10 Likelihood Ratios of Findings for Pediatric Pneumonia 

Test (No. With 
Finding) 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

Tachypnea, 
chest indraw¬ 
ing, and alveo¬ 
lar rales 
(27) 

0.43 

0.84 

2.7 

(1.4-5.1) 

0.68 

(0.50-0.92) 

4.0 

(1.6-9.8) 

Tachypnea and 
alveolar rales 
(29) 

0.46 

0.83 

2.6 

(1.4-4.9) 

0.70 

(0.48-0.91) 

4.1 

(1.7-10) 

Tachypnea 

(51) 

0.74 

0.67 

2.2 

(1.5-3.2) 

0.39 

(0.22-0.70) 

5.8 

(2.4-14) 

Tachypnea and 
chest indrawing 
(47) 

0.68 

0.69 

2.1 

(1.4-3.2) 

0.50 

(0.31-0.80) 

4.7 

(2.0-11) 

Alveolar rales 
(32) 

0.46 

0.79 

2.1 

(1.2-3.8) 

0.69 

(0.50-0.96) 

3.2 

(1.3-7.6) 

Chest indrawing 
and alveolar 
rales 
(30) 

0.42 

0.80 

2.1 

(1.2-3.9) 

0.71 

(0.53-0.97) 

1.2 

(2.9-7.0) 

Clinical judg¬ 
ment 
(59) 

0.74 

0.56 

1.7 

(1.2-2.3) 

0.46 

(0.25-0.84) 

3.6 

(1.5-8.7) 

Chest indrawing 
(56) 

0.71 

0.59 

1.7 

(1.2-2.4) 

0.54 

(0.32-0.91) 

3.5 

(1.5-8.3) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 


age groups. There was a significant difference in the sensitiv¬ 
ity and specificity of tachypnea when disease duration was 
considered. Sensitivity increased from 55% to 93% if disease 
was fewer than 3 days’ duration or more than 6 days’ dura¬ 
tion, respectively. Specificity increased from 64% to 73% as 
well. 

Reviewed by Daniel A. Ostrovsky, MD 
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CLINICAL SCENARIOS 


CHAPTER 


Is This Patient 

Pregnant? 

Lori A. Bastian, MD, MPH 
Joanne T. Piscitelli, MD 


Are These Patients Pregnant? 

For each of the following cases, the clinician may need to 
determine the probability that the patient is pregnant. 

CASE 1 A 36-year-old woman telephones her primary care 
physician, complaining of symptoms consistent with uncom¬ 
plicated sinusitis. Before treating her with an antibiotic, you 
ask her about the possibility of pregnancy; she states her last 
menstrual period was 3 weeks ago and she is not pregnant. 

CASE 2 A sexually active 16-year-old girl requests birth 
control pills and asks during the pelvic examination, when 
her mother has stepped out of the room, if you can tell 
whether she is pregnant. Her last menstrual period was 8 
weeks ago, her home pregnancy test result was negative, 
and findings on her pelvic examination were normal. 

CASE 3 A 41-year-old woman presents with breast ten¬ 
derness, and her last menstrual period was 6 weeks ago. She 
wants to know whether she is “going through the change.” 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Frequent laboratory analyses are performed in the outpatient 
clinic and emergency department to rule in or to rule out the 
possibility of pregnancy. Generally accepted clinical indica¬ 
tors of pregnancy include amenorrhea, morning sickness, 
tender or tingling breasts, and, after 8 weeks’ gestational age 
(defined as weeks since the last menstrual period), an 
enlarged uterus with a soft cervix. Standard textbooks of 
obstetrics do not indicate the value (ie, sensitivity and speci¬ 
ficity) of these symptoms and signs as predictors of the diag¬ 
nosis of early pregnancy. 

In the outpatient clinical setting, there are many reasons to 
determine whether the patient is pregnant, including avoiding 
nonurgent radiographs; avoiding teratogenic drugs, such as 
anticonvulsants; initiating early prenatal care; reassuring the 
patient; and explaining the multiple nonspecific complaints 
easily confused with the early symptoms of pregnancy. 

We are reviewing a common problem facing the primary 
care physician: When treating or evaluating a woman of 
childbearing years, what is the value of historical or physical 
examination features in determining the probability of early 
pregnancy? We will focus on the patient’s medical history 
and physical examination findings that help the clinician rule 
in or rule out early pregnancy. We intend to answer the fol¬ 
lowing questions: (1) What is the value of history and symp¬ 
toms in determining the probability of early pregnancy? 
(2) How accurate are home pregnancy tests (often part of the 
patient’s medical history) for determining early pregnancy? 
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(3) What is the value of physical examination findings in 
determining the probability of early pregnancy? 

ANATOMIC AND PHYSIOLOGIC ORIGINS OF 
THE SIGNS AND SYMPTOMS OF PREGNANCY 
DURING THE FIRST TRIMESTER 

Pregnancy is suspected whenever a woman of childbearing 
years who has had regular menstrual cycles notices abrupt 
cessation of her menses. However, cessation of menses is a 
difficult symptom to evaluate in patients with previously 
irregular bleeding patterns. Occasionally, women have unex¬ 
plained cyclic bleeding during pregnancy, especially in the 
first few months, and thus lack the symptom of amenorrhea. 
About 8% of pregnant women have a small amount of bleed¬ 
ing on or before the 40th day, which is thought to be related 
to implantation. 1 

The term morning sickness refers to the tendency of many 
women (approximately 50%) to develop nausea, often with 
vomiting, between 6 and 12 weeks’ gestational age. 1 Usually 
the nausea is worse when the pregnant woman awakens in 
the morning, whereas it tends to diminish as the day 
progresses. 

Shortly after missing her first period, the pregnant woman 
may notice a heavy sensation in her breasts, accompanied by 
tingling and soreness. These symptoms relate to hormone 
stimulation of the ducts and alveoli of the breast paren¬ 
chyma, but may occur in identical form just before a men¬ 
strual period. As early as 6 weeks’ gestational age, there may 
be noticeable enlargement of the breasts, with engorgement 
of the superficial veins in the breasts. 2 During the first trimes¬ 
ter, the nipples darken and become more sensitive. The are¬ 
olar areas darken and become puffy. These symptoms and 
signs are thought to be of more value in primigravida 
because in multigravida women, areolar and nipple changes 
often remain from previous pregnancies. 3 

A few weeks after implantation (6 weeks’ gestational age), 
distinct enlargement of the uterus may be felt on bimanual 
palpation. In early pregnancy, the uterus becomes softened 
and changes from a pear-shaped configuration to a globular 
contour. 1 The congestive hyperemia of the pelvis in early 
pregnancy is manifested by a softening of the vagina and cer¬ 
vix, as well as a change in the color of these tissues. A signifi¬ 
cant increase in uterine artery pulsatile activity may occur as 
blood flow to the pregnant uterus increases. 4 In early preg¬ 
nancy, the enlarging uterus exerts pressure on the bladder. 
Some patients note an increase in urinary frequency and 
nocturia during the first trimester. 

HOW TO ELICIT THESE SYMPTOMS AND SIGNS 

Medical History 

Although patients may give a simple description such as “I 
may be pregnant,” the examiner should seek a more complete 
medical history. Histories that indicate an increased likeli¬ 
hood of pregnancy include amenorrhea, morning sickness, 


breast symptoms (swelling, tingling, or tenderness), sexual 
activity, not using or inconsistent use of contraception, 
patient suspects she is pregnant, and a positive home preg¬ 
nancy test result. Specific questions to ask include the follow¬ 
ing: (1) When was your last menstrual period, and was it 
normal? (2) Do you use any form of contraception? (3) Do 
you have any symptoms of pregnancy? (4) Is there a chance 
you are pregnant? 

Frequently, the patient may report, “My home pregnancy 
test was positive, and I want to know whether I am preg¬ 
nant.” Important questions regarding this type of history 
would be these: (1) How many days or weeks after your last 
menstrual period did you perform the test? (2) Did you feel 
comfortable performing the test? (3) Did the instructions 
seem complicated to you? (4) What kind of home pregnancy 
test did you use? (5) Did you repeat the test and get a similar 
result? 

Physical Examination 

To diagnose pregnancy, the clinician might examine the 
patient’s breasts, as well as the vaginal wall, cervix, and 
uterus, by bimanual examination. The breasts may become 
engorged and enlarged, with darkening of the areolar area. 
The venous pattern over the breasts becomes increasingly 
visible as pregnancy progresses. 5 

Vaginal examination can be performed to elicit the Chad¬ 
wick sign associated with early pregnancy. As early as 8 to 12 
weeks’ gestational age, the mucous membranes of the vulva, 
vagina, and cervix become congested and take on a bluish- 
violet hue (Chadwick sign). 1 This hue is especially well 
defined in the anterior vaginal wall but is also present to 
some extent throughout the vagina and on the cervix. The 
Chadwick sign is rarely seen before 7 weeks’ gestational age. 6 

On bimanual examination, softening of the cervix (Goodell 
sign) may be detected by 8 weeks’ gestational age. 7 The cer¬ 
vix of a nonpregnant woman is fibrous and normally feels 
like the tip of the nose. By contrast, the progressive edema 
that develops during pregnancy softens the consistency of 
the cervix tip to approximate that of the lips (Goodell 
sign). 

Examination of the uterus on bimanual examination can 
be performed to detect changes in uterine consistency and 
size. A palpable softening of the lowermost portion of the 
corpus occurs at about 6 weeks’ gestational age (Hegar sign). 7 
To elicit this sign, when the uterus is anteverted, the exam¬ 
iner places two fingers in the anterior vaginal fornix (or the 
posterior fornix in the presence of a retroverted uterus) and 
then compresses behind the fundus at the lower uterine seg¬ 
ment with the other hand, using suprapubic pressure (Figure 
42-1). In this way, a distinct area of uterine softening is 
observed between 2 firmer structures: the fundus above and 
the cervix below. 5 Occasionally, the softening at the isthmus 
is so marked that the cervix and the body of the uterus seem 
to be separate organs. 3 

Another early sign of pregnancy is the uterine artery 
pulsation that can be palpated on a bimanual examina¬ 
tion. 4 During a bimanual examination, the second and 
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third digits of the examining hand can be placed in the 
lateral vaginal fornix, and the presence of uterine artery 
pulsations can often be palpated with minimal pressure 
on the parametrium. 4 

A few weeks after the embryo has become implanted, a dis¬ 
tinct enlargement of the uterus may be felt on bimanual 
examination. The uterus remains confined in the pelvis until 
12 weeks’ gestational age, when the fundus becomes palpable 
above the pubic symphysis (Figure 42-2). 

The identification of the fetal heart rate distinct from the 
maternal heart rate establishes a diagnosis of pregnancy. 
Transvaginal ultrasonography can detect fetal heart activity 
as early as 5 weeks’ gestational age, and transabdominal 
ultrasonography can detect this activity as early as 6 weeks’ 
gestational age. Instruments that use the Doppler effect can 
detect fetal cardiac activity at 10 to 12 weeks’ gestational age. 
The fetal heart can usually be auscultated with a fetoscope by 
20 weeks’ gestational age. 

Reference Standard for Diagnosing Early Pregnancy 

In this review, the detection of the (3 subunit of human chori¬ 
onic gonadotropin (HCG) in urine or serum is the routine 
reference standard (or gold standard) for diagnosing early 
pregnancy. The diagnostic reliability of both the serum and 
urine HCG tests is comparable. The sensitivity and specific¬ 
ity for the diagnosis of pregnancy for both tests are between 
97% and 100% when performed in the laboratory. 8 In this 
review, we also report the results of studies conducted before 
the development of the HCG test. These earlier studies used 
delivery as the reference standard. 

METHODS 

Search Strategy 

We searched the MEDLINE database for English-language 
articles concerning the diagnosis of pregnancy that were 
published between 1966 and 1996. The key words used were 
“pregnancy,” “diagnosis,” and “pregnancy tests.” Additional 
articles listed in the bibliographies of standard obstetric texts 
and references cited in articles included in our study were 
also included among the articles considered. 

Articles were systematically reviewed by authors and given 
a grade of A, B, or C according to the study design and level 
of evidence (see Table 1-7 for a summary of Evidence Grades 
and Levels). 9 Articles were excluded if the results of the symp¬ 
tom or sign being investigated were not compared with the 
gold standard or the results could not be classified into a con¬ 
tingency table (attempts were made to reach authors of 
potential articles to obtain additional information needed to 
create contingency tables). 

Through the MEDLINE, textbook reference, and bibliog¬ 
raphy searches, we initially identified 55 articles, 40 of which 
were rejected because the test was not compared with the 
gold standard (urine or serum HCG test) or a pregnancy 
outcome. The remaining 15 articles were then analyzed by 
us, and 6 more were excluded because the reported data were 



Figure 42-1 Examination Eliciting the Hegar Sign 

The Hegar sign is a softening of the lower uterine segment that can be 
appreciated during a bimanual examination. 



Figure 42-2 Uterine Height at Different Gestational Weeks 

The height of the fundus at comparable gestational dates varies greatly among 
patients. Those shown are the most common. A convenient rule of thumb is that, 
at 20 weeks of gestation, the fundus is usually at or slightly above the umbilicus. 
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not sufficient to permit construction of contingency tables. 
Therefore, the results of 9 studies form the basis for this 
review. 

We used data from contingency tables to calculate sensitiv¬ 
ity and specificity. Likelihood ratios were also calculated to 
characterize the behavior of the diagnostic tests. The positive 
likelihood ratio (LR+) is defined as sensitivity/! 1 - specificity) 
and expresses the change in odds favoring a disease, given a 
positive test result (LR+ values are > 1), whereas the negative 
likelihood ratio (LR-) is defined as (1 - sensitivity)/specihc- 
ity and expresses the change in odds favoring disease, given a 
negative test result (LR- values are 0 to l). 10 Data were suffi¬ 
ciently similar in design to assess for statistical similarity. The 
data were pooled when the Breslow-Day test for homogene¬ 
ity was not significant (P > .05). 11 

Accuracy of History and Symptoms 
for Pregnancy Diagnosis 

Several studies have been performed to evaluate the value of 
patient history in ruling in or ruling out early pregnancy 
compared with the gold standard HCG test (Tables 42-1, 
42-2, 42-3, and 42-4). Among 208 consecutive patients for 
whom a qualitative serum HCG determination is ordered, 
emergency department physicians recorded the date of the 
patient’s last menstrual period, whether her menstrual 
period was on time, if birth control had been used, and 
whether the patient suspected she was pregnant. 12 The main 
indication for ordering a pregnancy test in this study was 
abdominal pain (138 patients). Sixty-eight women (33%) 
were pregnant. Three historical variables were statistically 


Table 42-1 Does a Delayed Menstrual Period Predict Pregnancy? 8 

Pregnant 

Evidence ___ 


Study 

Grade 5 

Characteristics 

Yes 

No 

LR (95% Cl) 

Robinson 
and Barber 15 

A 

Delayed 

menses 

618 

248 

1.6 (1.4-1.7) 



Menses on 
time 

361 

365 

0.62 (0.56-0.69) 

Ramoska 
et al 12 

A 

Delayed 

menses 

58 

58 

2.1 (1.6-2.6) 



Menses on 
time 

10 

82 

0.25(0.14-0.45) 

Stengel 
et al 13c 

B 

Delayed 

menses 

3 

43 

1.0(0.38-2.9) 



Menses on 
time 

9 

136 

0.99(0.70-1.4) 

Zabin 
et al 16 

A 

Delayed 

menses 

703 

1078 

1.1 (1.0-2.9) 



Menses on 

331 

707 

0.81 (0.68-0.76) 


time 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

"In testing for homogeneity, % 2 = 37 and P = . 001. Therefore, data were not pooled. 
b See Table 1 -7 for a summary of Evidence Grades and Levels. 

'Unpublished data from this study provided by David Seaberg, MD, University of 
Pittsburgh, Pennsylvania, June 1995. 


less likely to be associated with pregnancy: a last menstrual 
period that was on time, the patient thinking that she was 
not pregnant, and the patient stating that there was no 
chance that she could be pregnant (P < .001). Combina¬ 
tions of historical criteria were unsuccessful at ruling out 
pregnancy; there was still a 10% chance of pregnancy’s 
being overlooked using any combination of these historical 
variables. 

Women may not associate symptoms with early pregnancy. 
Investigators measured the effectiveness of a standardized 
patient history questionnaire in detecting unrecognized 
pregnancies. 13 Consecutive fertile women (n = 191) present¬ 
ing to the emergency department for any reason completed a 
menstrual and sexual history questionnaire and had a preg¬ 
nancy test. This study reports a 6.3% prevalence of unrecog¬ 
nized pregnancy, defined as a “pregnancy not definitely 
known to exist” when the patient presented to the emergency 
department. 13 Among those with abdominal pain or pelvic 
complaints (70 patients), the prevalence of unrecognized 
pregnancy was found to be 13%. Historical factors were ana¬ 
lyzed for correlation with positive pregnancy test results. Two 
factors were found to be statistically significant correlates: the 
patient thought there was a chance she could be pregnant 
and an abnormal last menstrual period (P < .001). One fac¬ 
tor, the delayed menstrual period, was not found to be signif¬ 
icant (LR+, 1.0). Among the historical factors analyzed, “Is 
there any chance that you could be pregnant now?” was the 
most sensitive for pregnancy (92%), with a specificity of 71% 
(David Seaberg, MD, University of Pittsburgh, Pennsylvania, 
unpublished data, June 1995). 

Unlike women who do not associate symptoms with early 
pregnancy, others self-diagnose pregnancy and request medi¬ 
cal confirmation. Women (n = 283) with late menstrual peri¬ 
ods who requested evaluation in a health center completed a 
structured contraception and sexual history questionnaire 
that included questions on whether the woman believed she 
was pregnant and whether subjective symptoms of preg¬ 
nancy were present. 14 The patient sealed her answers to the 
questionnaire in an envelope before the results of the preg¬ 
nancy tests were available. One hundred eighteen women 
(42%) were pregnant. Women were better at ruling out preg¬ 
nancy (sensitivity, 92%) than ruling in pregnancy (specific¬ 
ity, 42%). 

In another study, 15 general practitioners performed a study 
to determine the value of pregnancy symptoms (presence or 
absence of amenorrhea and morning sickness) in determin¬ 
ing the probability of pregnancy. Information was collected 
prospectively about women who consulted their general 
practitioner for a diagnosis of pregnancy; the gold standard 
was a positive pregnancy test result. General practitioners 
throughout Scotland (n = 155) participated in the study, 
which was restricted to women between the ages of 16 and 45 
years. Of the 1592 women enrolled, 979 (62%) were preg¬ 
nant. The symptom of amenorrhea was 63% sensitive and 
60% specific for pregnancy. Morning sickness as a symptom 
of pregnancy had a sensitivity of 39% and a specificity of 
86%. This study did not ask the participants whether they 
thought they were pregnant. 
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In 1996, Zabin et al 16 performed a similar study in a popula¬ 
tion of adolescents (younger than 17 years) to determine his¬ 
torical predictors of pregnancy. They performed a cross- 
sectional study of 2926 adolescents who presented to 52 clinics 
in the United States and requested a pregnancy test. The girls 
were asked to complete an anonymous questionnaire (98% 
response rate) while they waited for the results of their preg¬ 
nancy test. Thirty-six percent of adolescents in this study were 
pregnant. A late menstrual period was the most frequent rea¬ 
son (63%) for the visit (for pregnancy: sensitivity, 68%; speci¬ 
ficity, 40%). 

Although a delayed menstrual period yields statistically sig¬ 
nificant results for predicting pregnancy, with an LR+ of 1.1 
to 2.1 (Table 42-1), the results are inconsistent and, therefore, 
not a reliable symptom of pregnancy. Typical early symptoms 
of pregnancy provide more consistent results across studies 
and serve to increase slightly the likelihood of pregnancy 
(LR+, 2.4) (Table 42-2). Unfortunately, the absence of early 
symptoms of pregnancy, such as morning sickness, does not 
rule out pregnancy (LR-, 0.71). Likewise, the patient’s use of 
birth control decreases the likelihood of pregnancy (LR-, 
0.29), but not enough to efficiently rule it out (Table 42-3). 
Even the patient’s suspicion of pregnancy statistically alters 
the likelihood of pregnancy, but not enough to be reliable 
(Table 42-4). 

Accuracy of Home Pregnancy Tests 

It has been reported that one-third of women who think they 
may be pregnant have used a home pregnancy test. 17 A recent 
study of teenagers requesting pregnancy tests in health depart¬ 
ments revealed that 28% of adolescents had used an in-home 
pregnancy test before their visit. 16 In-home pregnancy test kits 
became available in 1976 and used the hemagglutination- 
inhibition method of detecting HCG. Currently, most test kits 
use monoclonal HCG antibodies, which can produce test 
results that can be read as a color change. The accuracy of these 
tests is claimed to be 97% to 99% by the manufacturers. 18 
Studies have shown that accuracy depends on several factors, 
such as whether the woman read the instructions carefully and 
the number of days beyond the missed menstrual period. 19 

In 1986, Doshi 20 published a study measuring the accuracy 
of 3 in-home tests for early pregnancy. The author studied 
109 women of childbearing age whose menses were late by at 
least 6 days, but not more than 20 days. Volunteers for the 
study were obtained from 3 sites; the majority were white 
and educated. Participants brought to the study site their first 
morning urine, which was then divided in half. One portion 
of the sample was returned to the participant to use in per¬ 
forming a pregnancy test at home. Using 1 of 3 study kits 
(Answer [Carter Products; Carter-Wallace, Inc, New York, 
New York]; Daisy 2 [Boehringer-Mannheim Corp, Ingel- 
heim, Germany]; and e.p.t. [Warner-Lambert Co, Morris 
Plains, New Jersey]), the participants were instructed to fol¬ 
low the package directions in performing the test, call the site 
with results, and complete and return the data collection sur¬ 
vey to the investigator. The investigator performed an identi¬ 
cal test using the other portion of the urine sample. Despite 


Table 42-2 Probability of Pregnancy if Patient 
Reports Symptoms of Pregnancy 3 

Pregnant 

Evidence - 


Study 

Grade 6 


Yes 

No 

LR (95% Cl) 

Robinson 

A 

Morning sickness 

380 

88 

2.7 (2.2-3.3) 

and Barber 15 


No morning 
sickness 

599 

525 

0.71 (0.67-0.76) 

Bachman 14 

A 

Any pregnancy 
symptoms 

59 

34 

2.4(1.7-3.4) 



No pregnancy 
symptoms 

59 

131 

0.63 (0.52-0.77) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Pregnancy symptoms defined as morning sickness, breast tenderness and fullness, 
urinary frequency, or fatigue. 

b See Table 1 -7 for a summary of Evidence Grades and Levels. 


Table 42-3 Probability of Pregnancy if Patient Reports 
Not Using Birth Control 

Pregnant 


Study 

Grade 3 


Yes 

No 

LR (95% Cl) 

Ramoska 

A 

No birth control 

61 

96 

1.3 (1.1-1.5) 

et al 12 


Birth control 

7 

44 

0.33(0.16-0.69) 

Stengel 

B 

No birth control 

9 

88 

1.5 (1.1-2.2) 

et al 13 ' b 


Birth control 

3 

91 

0.49(0.18-1.3) 

Pooled 3 


No birth control 

70 

184 

1.5(1.3-17) 



Birth control 

10 

135 

0.29(0.16-0.53) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“Unpublished data from this study provided by David Seaberg, MD, University of 
Pittsburgh, Pennsylvania, June 1995. 

“In testing for homogeneity, x 2 = 0.097 and P = .76. Therefore, data were pooled. 


Table 42-4 Probability of Pregnancy if Patient Thinks There Is a 

Chance She Is Pregnant 


Evidence 

Patient Thinks 

Pregnant 


Study 

Grade 3 

She Is 

Yes 

No 

LR (95% Cl) 

Bachman 14 

A 

Pregnant 

109 

95 

1.6 (1.4-1.8) 



Not pregnant 

9 

70 

0.18(0.09-0.34) 

Ramoska 

A 

Pregnant 

58 

63 

1.9(1.5-2.3) 

et al 12 


Not pregnant 

10 

77 

0.27(0.15-0.48) 

Stengel 

B 

Pregnant 

11 

52 

3.2 (2.4-4.2) 

et al 13h 


Not pregnant 

1 

127 

0.12(0.02-0.77) 

Zabin 

A 

Pregnant 

789 

640 

2.1 (2.0-2.3) 

et al 16 


Not pregnant 

254 

1148 

0.38 (0.34-0.42) 

Pooled results 3 


Pregnant 

967 

850 

2.1 (2.0-2.2) 



Not pregnant 

270 

1422 

0.35(0.31-0.39) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“Unpublished data from this study provided by David Seaberg, MD, University of Pitts¬ 
burgh, Pennsylvania, June 1995. 

“In testing for homogeneity, x 2 = 4.3 and P= .23. Therefore, data were pooled. 
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manufacturer claims of 97% overall accuracy for the test kits 
used, the investigator found an accuracy of 77%. The partici¬ 
pants had a sensitivity of 80% and specificity of 68% for 
detecting early pregnancy with the home pregnancy tests 
(LR+, 2.5; LR-, 0.29), with similar diagnostic efficiency 
observed for all 3 kits. These results concerned Doshi 20 
because of missed opportunities for early prenatal care and 
the postponement of discontinuing teratogenic substances. 

In 1993, investigators from France published an extensive 
analysis of the reliability and feasibility of home pregnancy 
tests. 21 They looked at 27 different test kits (manufacturers 
were not identified) and selected 11 kits for the study, which 
were found to have a 100% sensitivity and specificity under 
ideal laboratory conditions. Laywomen volunteers (aged 14- 
49 years; n = 638) were asked to test a home-use test kit for 
pregnancy using a coded urine specimen. They also were 
asked to complete a questionnaire after they performed the 
test. The results of the diagnostic study showed that 5 of the 
11 kits had 100% specificity; the others had specificity values 
between 77% and 94%. Two kits had a high diagnostic sensi¬ 
tivity (>90%), and 2 kits were found to have a low diagnostic 
sensitivity (<10%). Whereas 90% of the participants claimed 
that the test was easy to perform, of the 478 positive (result 
positive for pregnancy) urine samples distributed, 230 were 
falsely interpreted as negative (sensitivity, 48%). The authors 
concluded that the main reason for the poor performance 
was difficulty in interpreting the instructions rather than the 
socioeconomic situation of the participants. 


Table 42-5 Probability of Pregnancy if Physician 
Examination Findings Present 



Evidence 


Pregnant 


Study 

Grade 3 

Characteristic 

Yes 

No 

LR (95% Cl) 

Chadwick 6 

C 

Chadwick sign 






Present 

144 

1 

29(4.1-200) 



Absent 

137 

55 

0.50 (0.44-0.56) 

Robinson and 

A 

Breast signs 




Barber 15 


Present 5 

549 

127 

2.7 (2.3-3.2) 



Absent 

430 

486 

0.55 (0.50-0.60) 

Robinson and 

A 

Vaginal examination signs 


Barber 15 


Present 5 

172 

34 

3.2 (2.2-4.5) 



Absent 

807 

579 

0.87 (0.84-0.90) 

Robinson and 

A 

Palpable fundus 




Barber 15 


Present 

84 

19 

2.8 (1.7-4.5) 



Absent 

895 

594 

0.94 (0.92-0.97) 

Meeks et al 4 

B 

Uterine artery pulsation 





Present 

19 

9 

11 (5.6-21) 



Absent 

6 

121 

0.26(0.13-0.52) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“See Table 1 -7 for a summary of Evidence Grades and Levels. 

“Breast signs were not explicitly defined and include any abnormal findings on 
breast examination. 

“Vaginal examination signs were not explicitly defined and include any abnormal 
findings on vaginal or pelvic examination. 


Accuracy of the Physical Examination 

Only a few studies have analyzed at the accuracy of the physi¬ 
cal examination for pregnancy. Unfortunately, no studies 
have examined interobserver or intraobserver reliability. In 
1887, Chadwick 6 published a study of 337 women evaluated 
weekly (until delivery for those women who were pregnant) 
to assess the presence of the Chadwick sign. He described the 
coloration of the vaginal wall as no color or doubtful color, 
suggestive color, characteristic color, and general deep color. 
He classified any vaginal wall with characteristic or general 
deep color to be “diagnostic.” With his criteria, the sensitivity 
of this physical sign is 51% and the specificity is 98%. No val¬ 
idation studies could be found. 

Robinson and Barber 15 performed a study in 1977 to 
determine the value and reliability of the physical examina¬ 
tion for pregnancy compared with a pregnancy test. They 
examined the vagina for signs of pregnancy, palpated the 
fundus, and assessed breast changes on physical examina¬ 
tion. The most common feature observed was breast signs 
(42%), with a sensitivity of 56% and a specificity of 79%. 
Thirteen percent of women were observed to have “signs” 
on vaginal examination (signs not specified, but presum¬ 
ably some combination of the Goodell, Hegar, and Chad¬ 
wick signs) consistent with pregnancy, with a sensitivity of 
18% and a specificity of 94%. Last, 6% of women were 
observed to have a palpable fundus at presentation for a 
pregnancy test (sensitivity, 9%; specificity, 97%). 

Recently, a study was performed to determine whether pal¬ 
pable uterine artery pulsation is a reliable clinical indicator of 
early pregnancy. 4 The authors conducted the study in 2 
phases. During the first phase, one of the authors examined 
299 women who were less than 6 weeks from their last men¬ 
strual period for palpable uterine artery pulsation; this 
examination was conducted after a medical history had been 
obtained, and thus the examiner was not blind to the clinical 
situation. During the second phase, one of the authors exam¬ 
ined 155 women for palpable uterine artery pulsation but 
performed only the bimanual examination and was blind to 
all other historical and physical examination data. With data 
from the second phase only, palpation of uterine artery pul¬ 
sations may be a valuable tool in diagnosing early pregnancy 
(sensitivity, 76%; specificity, 93%). According to the results 
of this study, physicians were encouraged to add uterine 
artery pulsation to their clinical examination in diagnosing 
early pregnancy. 

Despite descriptive articles dating back to the 1880s, no 
studies could be identified that measured the value of the 
Goodell or Hegar signs. In 1908, McDonald 7 reported the 
prevalence of early pregnancy findings in 100 women 
known to be pregnant. He followed up women with weekly 
pelvic examinations during their first trimester. In this 
descriptive study, pregnant women were found to have the 
following: Hegar sign, 94%; Goodell sign, 66%; and Chad¬ 
wick sign, 61%. This study is included for historical inter¬ 
est. Knowing the pregnancy status of patients creates 
expectation bias that probably overstates the value and 
prevalence of these signs. 
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As summarized in Table 42-5, several physical findings sig¬ 
nificantly increase the likelihood of pregnancy. The most 
useful findings on physical examination for making the diag¬ 
nosis of early pregnancy appear to be Chadwick sign (LR+, 
29) and palpable uterine artery pulsation (LR+, 11), although 
validation studies are needed because these 2 studies had 
comparatively lower methodologic quality scores. Unfortu¬ 
nately, if any of these signs are absent, this does not rule out 
pregnancy. 

THE BOTTOM LINE 

Clearly, to establish a diagnosis of early pregnancy, a clinician 
should order a urine or serum HCG test. However, there may 
be circumstances in which it would be useful for patients or 
physicians to know the value of pregnancy symptoms, home 
pregnancy test results, and physical examination findings for 
the diagnosis of pregnancy. 

We can predict the likelihood of pregnancy for the patients in 
the clinical scenarios. For case 1, the woman with sinusitis has a 
prior probability of pregnancy of about 5%. Because she reports 
that her menses was on time (LR-, 0.62) and states that she is 
not pregnant (LR-, 0.35), the calculated probability of preg¬ 
nancy might be from 1.7% to 3.1% for this patient. We would 
not order a pregnancy test for case 1. For case 2, the sexually 
active teenager, we can also calculate a probability that she might 
be pregnant. Zabin et al 16 reported a pregnancy rate of 36% 
among teenagers presenting for a pregnancy test in their study. If 
we assume her prior probability of pregnancy is 36% and know 
her menses is late (LR+, 1.1), her home pregnancy test result 
was negative (LR-, 0.29), and her pelvic examination findings 
were normal (LR-, 0.87), her probability of pregnancy ranges 
from 10% to 41%, and we would recommend ordering a preg¬ 
nancy test for this case. For case 3, the 41-year-old woman with a 
late menses and breast tenderness, the prior probability of preg¬ 
nancy might be low (approximately 2%) because of decreased 
fecundity secondary to her age. If we consider her late menses 
(LR+, 1.6) and her breast tenderness (LR+, 2.4), her probability 
of pregnancy has increased approximately 2-fold to a range of 
3.1% to 4.9%, and we would order a pregnancy test. 

Patients may call their clinician asking for advice regarding 
a late period or symptoms of pregnancy. They may want to 
know whether they should perform a home pregnancy test, 
or they may request assistance in interpreting the test results. 
Evidence suggests that some historical features, when absent, 
are fair but not reliable for ruling out pregnancy. When diag¬ 
nosing pregnancy, the patient or clinician should not rely on 
symptoms and signs of pregnancy or a home pregnancy test; 
a laboratory test should be requested. 
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UPDATE: Early Pregnancy 



Prepared by Lori A. Bastian, MD, MPH 
Reviewed by Joanne! Piscitelli, MD 


CLINICAL SCENARIO 


A 16-year-old adolescent who is concerned she might be 
pregnant calls her local Planned Parenthood clinic. She 
has not noticed any symptoms of pregnancy such as early 
morning nausea or breast tenderness. She does observe 
that her period is 3 weeks overdue. She purchased a home 
pregnancy test (HPT) kit, and the results suggested that 
she is not pregnant. When asked about performing the 
HPT kit, she observes she felt nervous and was not sure 
that she followed the directions correctly. 

UPDATED SUMMARY ON PREGNANCY 

Original Review 

Bastian LA, Piscitelli J. Is this patient pregnant? can you reli¬ 
ably rule in or rule out early pregnancy by clinical examina¬ 
tion? JAMA. 1997;278(7):586—591. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series combined with the subjects 
“pregnancy” and “pregnancy tests,” published in English 
between 1996 and September 2004. The results yielded 301 titles, 
for which we reviewed the titles and abstracts; 12 articles were 
selected for additional review. These articles were reviewed to 
identify studies that assessed the sensitivity and specificity of the 
medical history or physical examination features of pregnancy. 
Only 1 article, a meta-analysis on the diagnostic characteristics 
of HPT kits, was retained. 1 We included home testing, as in the 
original review, because it is frequently part of the patient’s med¬ 
ical history when evaluating for pregnancy. 

IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 


CHANGES IN THE REFERENCE STANDARD 

There are no changes observed in the reference standard, 
which is based on laboratory testing of serum or urine human 
chorionic gonadotropin (HCG). Recently, Wilcox et al 2 used an 
extremely sensitive assay for HCG and found that 10% of preg¬ 
nancies were undetectable on the first day of the missed period. 
These authors recommend waiting 1 week after the first day of 
the missed period to perform pregnancy testing. 

RESULTS OF LITERATURE REVIEW 

Home pregnancy kits have become increasingly popular and 
manufacturers claim these HPT kits are 99% accurate. Most 
studies have found that women choose to use HPT kits 
because of the speed of obtaining results and the convenience 
of testing at home. A systematic review of 5 studies reviewing 
16 HPT kits found that the diagnostic performance of these 
kits is affected by the characteristics of the users ( ). 

In studies in which urine samples obtained by the investiga¬ 
tors were tested by volunteers, sensitivity was 91% (95% con¬ 
fidence interval [Cl], 84%-96%). However, the sensitivity 
was less in studies in which subjects were actual patients who 
used the HPT kit on their own urine samples (sensitivity, 
75%; 95% Cl, 64%-85%). 

EVIDENCE FROM GUIDELINES 

None. 


CLINICAL SCENARIO—RESOLUTION 


For diagnosing pregnancy, you recommend repeating the 
HPT kit 1 week after using the first kit. If the results remain 
negative, she should still present to the clinic for further 
testing because most physicians would recommend that 
this teenager be screened for sexually transmitted infections 
and counseled about contraception, independent of the 
HPT’s result. 


None. 
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Table 42-6 Likelihood Ratios of Commercially Available Home 
Pregnancy Test Kits 3 

Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Acu Test 3 

0.52 

0.89 

4.7 (1.5-14) 

0.54(0.36-0.81) 

Advance 4 

0.86 

0.91 

9.7 (3.8-25) 

0.15(0.10-0.22) 

Advance 3 

0.91 

1.0 

53 (3.4-830) 

0.12(0.04-0.32) 

Answer 2 3 

1.0 

0.94 

13(3.9-13) 

0.02 (0-0.26) 

Answer 5 

0.78 

0.64 

2.2(1.3-3.7) 

0.34(0.18-0.62) 

Daisy 2 5 

0.82 

0.64 

2.3 (1.1-4.8) 

0.28(0.11-0.75) 

Daisy 2 3 

0.98 

0.61 

2.5(17-3.6) 

0.04(0.01-0.28) 

e.p.t. 5 

0.82 

0.75 

2.3(1.3-4.2) 

0.28(0.15-0.55) 

e.p.t. 3 

0.88 

1.0 

53 (3.4-832) 

0.15(0.06-0.17) 

e.p.t. plus 4 

0.90 

0.92 

13(4.4-40) 

0.10(0.06-0.17) 

e.p.t. plus 3 

0.95 

1.0 

63 (4.0-988) 

0.07 (0.02-0.25) 

Fact 3 

1.0 

0.94 

14(4.2-46) 

0.01 (0-0.23) 

First 3 

0.93 

1.0 

47 (3.0-744) 

0.10(0.03-0.32) 

Predictor 6 

0.97 

0.96 

22 (8.4-57) 

0.03(0.01-0.10) 

Predictor 3 

1.0 

0.77 

4.3 (2.3-8.1) 

0.02(0-0.31) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative like¬ 
lihood ratio. 

“Information on products is shown as reported by Bastian et al. 1 Testing kits may sub¬ 
sequently undergo changes in name, undergo product updates, or be sold between 
manufacturers. Acu Test: J.B. Williams Co, Cranford, New Jersey. Advance and Fact: 
Advanced Care Products, Ortho Pharmaceutical Corp, Raritan, New Jersey. Answer 2, 
Answer, and First: Carter Products, Carter-Wallace, Inc, New York, New York. Daisy 2: 
Boehringer-Mannheim Corp, Ingelheim, Germany, e.p.t. And e.p.t. Plus: Warner-Lam¬ 
bert, Morris Plains, New Jersey. Predictor: Whitehall Laboratories, New York, New York. 
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a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 


EARLY PREGNANCY— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The probability of pregnancy varies, depending on the clini¬ 
cal situation. In the emergency department, the prevalence 
of unsuspected pregnancy is 6.3%. The prevalence increased 
to 13% in women with abdominal or pelvic complaints. 7 
Among women trying to get pregnant, the probability of 
pregnancy after a single episode of unprotected sexual inter¬ 
course approximates 20% to 33%. 8 

POPULATION FOR WHOM PREGNANCY 
SHOULD BE CONSIDERED 

• All women of childbearing years with an intact uterus who 
are sexually active and who have missed their last men¬ 
strual period or had an abnormal menstrual period. 

• Any woman who wonders whether she might be preg¬ 
nant. 


DETECTING THE LIKELIHOOD OF PREGNANCY 

General symptoms of early pregnancy include amenorrhea, 
morning sickness, and tender or tingling breasts. In the original 
review, the range of likelihood ratios (LRs) for women report¬ 
ing a delayed menses was 1.0 to 2.1 and 0.25 to 0.99 for women 
reporting their menses on time. For women reporting morning 
sickness or any pregnancy symptoms, the LR was 2.7 or 2.4, 
respectively. Another indicator of early pregnancy is whether 
the woman thinks she is pregnant. When a woman thinks there 
is a chance she is pregnant, the LR for pregnancy is 2.1 (95% Cl, 
2.0-2.2); if she thinks she is not pregnant, the LR is 0.35 (95% 
Cl, 0.31-0.39). Physical examination findings, such as an 
enlarged uterus with a soft cervix or a palpable uterine artery, 
have been studied and may be useful in some clinical settings. 

REFERENCE STANDARD TEST 

To establish a diagnosis of early pregnancy, a clinician should 
order a urine or serum HCG test. 





































EVIDENCE TO 


SUPPORT THE UPDATE: 


Early Pregnancy 



TITLE Diagnostic Efficiency of Home Pregnancy Test 
Kits: A Meta-analysis. 

AUTHORS Bastian LA, Nanda K, Hasselblad V, Simel DL. 

CITATION Arch Fam Med. 1998;7(5):465-469. 

QUESTION What are the diagnostic characteristics of 
home pregnancy test kits? 

DESIGN A systematic literature search of studies that 
compared home pregnancy test kits with laboratory testing 
of human chorionic gonadotropin using MEDLINE from 
1966-1996. Two investigators extracted data independently. 
Five studies evaluating 16 home pregnancy test kits met the 
inclusion criteria. 

MAIN OUTCOME MEASURES 

Sensitivity, specificity, and effectiveness scores. The data were 
used to calculate the likelihood ratios (not reported in the 
original publication). 

MAIN RESULTS 

The authors found data for 11 different home testing kits. We 
dropped information on the OVA II because complete infor¬ 
mation on the manufacturer was not available, 4 leaving the 
10 shown in le 42-7. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS Comprehensive review of home pregnancy 
test (HPT) kits that were described in the original Rational 
Clinical Examination article. 

LIMITATIONS Most studies published in 1970s and 1980s 
after HPT kits came on market. The most recent study was 
published in 1989. HPT kits currently on the market have not 
been reviewed. 

The effectiveness of home pregnancy testing kits is depen¬ 
dent on the skill of the user. When taking the history from a 


woman who has used a testing kit, you should confirm that 
she repeated the results. Newer kits can be accurate when 


Table 42-7 Likelihood Ratios of Commercially Available Home 
Pregnancy Test Kits 3 


Test 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Acu Test 1 

0.52 

0.89 

4.7 (1.5-14) 

0.54(0.36-0.81) 

Advance 2 

0.86 

0.91 

9.7 (3.8-25) 

0.15(0.10-0.22) 

Advance 1 

0.91 

1.0 

53 (3.4-830) 

0.12(0.04-0.32) 

Answer 2 1 

1.0 

0.94 

13(3.9-13) 

0.02 (0-0.26) 

Answer 3 

0.78 

0.64 

2.2(13-3.7) 

0.34(0.18-0.62) 

Daisy 2 3 

0.82 

0.64 

2.3(11-4.8) 

0.28(0.11-0.75) 

Daisy 2 1 

0.98 

0.61 

2.5(17-3.6) 

0.04(0.01-0.28) 

e.p.t. 3 

0.82 

0.75 

2.3(13-4.2) 

0.28(0.15-0.55) 

e.p.t. 1 

0.88 

1.0 

53 (3.4-832) 

0.15(0.06-0.17) 

e.p.t. Plus 2 

0.90 

0.92 

13(4.4-40) 

0.10(0.06-0.17) 

e.p.t. Plus 1 

0.95 

1.0 

63 (4.0-988) 

0.07 (0.02-0.25) 

Fact 1 

1.0 

0.94 

14(4.2-46) 

0.01 (0-0.23) 

First 1 

0.93 

1.0 

47 (3.0-744) 

0.10(0.03-0.32) 

Predictor 5 

0.97 

0.96 

22 (8.4-57) 

0.03(0.01-0.10) 

Predictor 1 

1.0 

0.77 

4.3 (2.3-8.1) 

0.02(0-0.31) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Information on products is shown as originally reported in this systematic review. Testing 
kits may subsequently undergo changes in name, undergo product updates, or be sold 
between manufacturers. Acu Test: J.B. Williams Co, Cranford, New Jersey. Advance and 
Fact: Advanced Care Products, Ortho Pharmaceutical Corp, Raritan, New Jersey. Answer 
2, Answer, and Erst: Carter Products, Carter-Wallace, Inc, New York, New York. Daisy 2: 
Boehringer-Mannheim Corp, Ingelheim, Germany, e.p.t. And e.p.t. Plus: Warner-Lambert, 
Morris Plains, New Jersey. Predictor: Whitehall Laboratories, New York, New York. 


performed according to the manufacturer’s specifications. 
Most physicians (and patients) may be unaware that a nega¬ 
tive home pregnancy test result is not perfect for ruling out 
pregnancy. 

Reviewed by Lori A. Bastian, MD 
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Do These Patients Have Pulmonary Embolism? 

CASE 1 A 28-year-old woman with recently diagnosed 
systemic lupus erythematosus presents with 2 days of 
pleuritic chest pain and breathlessness. She has no leg 
symptoms and no personal or family history of venous 
thromboembolism. She is taking a second-generation oral 
contraceptive pill. Examination reveals a finding of mild 
tachypnea (20/min) and minimal tenderness over the 
right lateral chest wall. Examination finding of the legs is 
normal, and a red blood cell agglutination D-dimer test 
shows a negative result. 

CASE 2 A 78-year-old man presents with 3 days of wors¬ 
ening pleuritic chest pain and breathlessness. He was dis¬ 
charged from the hospital 2 weeks earlier after a 14-day 
admission with acute cholecystitis. Surgery was not per¬ 
formed. His medical history includes 2 episodes of idio¬ 
pathic, right-leg, deep vein thrombosis. He has controlled 
hypertension and previous left ventricular failure. The 
examination reveals tachypnea (20/min) but findings are 
otherwise normal. Chest radiograph and electrocardio¬ 
gram (ECG) findings are normal, and a red blood cell 
agglutination D-dimer test shows a negative result. 

Background 

Pulmonary embolism occurs in 1 to 2 persons per 1000 
annually in the United States. 1,2 If untreated, it is associated 
with a high mortality rate, but anticoagulant therapy is 
highly effective in reducing mortality. 3,4 The diagnosis of pul¬ 
monary embolism is difficult because of the wide spectrum 
of symptoms and signs, and most patients with suggestive 
symptoms do not have the disease. 5 Typically, patients with 
proven pulmonary embolism present with dyspnea or acute 
chest pain and less frequently with cough, hemoptysis, or 
fainting. 6,7 These findings often occur in association with 
well-defined risk factors, such as lower limb surgery or 
immobility (Table 43- 1). 8 ' 10 Frequent findings on examina¬ 
tion include tachycardia, tachypnea, and an accentuated pul¬ 
monary component of the second heart sound (S 2 ). Other 
features such as jugular venous distention, S 3 or S 4 (third or 
fourth heart sound), an audible systolic murmur at the left 
sternal edge, and hepatomegaly infrequently are present and 
may reflect right-sided ventricular compromise. 

Results of arterial blood gas analysis commonly show 
hypoxia and hypocapnia. Chest radiography results are non¬ 
specific, and common findings include an elevated hemidia- 
phragm, unilateral pleural effusion, and platelike atelectasis; 
radiography is useful because it will sometimes provide an 
alternative diagnosis (eg, pneumothorax). Similarly, ECG find¬ 
ings are nonspecific and may show T-wave inversion across 
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Table 43-1 Risk Factors for Venous Thromboembolism 


Risk Factors 810 

OR (95% Cl) 


Surgery 

21 (9.4-50) 


Trauma 

13(4.1-40) 


Immobility (hospital or nursing home) 

8.0(4.5-14) 


Cancer 

With chemotherapy 

6.5(2.1-20) 


Without chemotherapy 

4.1 (1.9-8.5) 


Neurologic disease with lower-extremity 
paresis 

3.0 (1.3-7.4) 


Oral contraceptive pill 10 

3.0 (2.6-3.4)“ 


Hormone therapy 9 

2.7 (1.4-5.0)° 



Abbreviations: Cl, confidence interval; OR, odds ratio. 
“Relative risk from case-control studies. 

“Relative hazard. 


precordial leads, the 53 ( 33 / 536^3 pattern, or a right-sided 
bundle-branch block . 6,7 Thus, although the above findings are 
observed in patients with objectively diagnosed pulmonary 
embolism, they also are common in patients without pulmo¬ 
nary embolism and lack specificity when considered individu¬ 
ally. On the other hand, pulmonary embolism is uncommon in 
the absence of acute or worsening breathlessness or chest 
pain . 6,7 Because anticoagulant therapy reduces mortality from 
pulmonary embolism, the threshold for considering the diag¬ 
nosis should be low . 3 We believe that pulmonary embolism 
should at least be considered whenever a patient presents with 
any of the above symptoms or symptom complexes, particu¬ 
larly in the presence of known risk factors or when there is no 
clear alternative. 

Before the development of accurate diagnostic testing, the 
diagnosis of pulmonary embolism largely was based on clini¬ 
cal history and examination findings. Unfortunately, the 
clinical evaluation alone proved inaccurate in diagnosing and 
excluding pulmonary embolism 7,1113 and was virtually aban¬ 
doned in the evaluation of patients with suspected pulmo¬ 
nary embolism. Lung scanning became routine in the 1980s 
and was shown to be clinically useful . 5 However, lung scan¬ 
ning proved to be less than optimal because more than half 
of patients with suspected pulmonary embolism had nondi¬ 
agnostic lung scan results and the prevalence of pulmonary 
embolism in such patients was approximately 25%. 5 

Once clinicians raise the possibility of pulmonary embo¬ 
lism, they can further define the clinical likelihood of pulmo¬ 
nary embolism into a pretest probability. Rather than 
definitively diagnosing or excluding pulmonary embolism, 
pretest probability assessment categorizes patients into sub¬ 
groups, such as low, intermediate, and high, with ascending 
order of prevalences of pulmonary embolism. The potential 
for clinical assessment of the pretest probability to signifi¬ 
cantly influence the posttest probability of pulmonary embo¬ 
lism was demonstrated in the Prospective Investigation of 
Pulmonary Embolism Diagnosis (PIOPED) study 5 and was 
confirmed in a later study by Wells et al . 14 When the partici¬ 
pating clinicians in the PIOPED study used clinical judgment 


to categorize patients into low-, moderate-, or high-pretest- 
probability subgroups for pulmonary embolism, a moderate 
correlation with disease prevalence was found (9%, 30%, and 
68 %, respectively). In addition, in patients with a low pretest 
probability and a high-probability lung scan result, only 
about 50% had pulmonary embolism, whereas in those with 
a moderate or high pretest probability and a high-probability 
lung scan result, more than 90% had pulmonary embolism . 5 

According to the medical history and physical examination 
findings, clinical prediction rules that assess pretest probability 
for deep vein thrombosis, a closely related condition to pulmo¬ 
nary embolism, have been developed and shown to simplify the 
diagnosis . 15,16 For example, the safety of withholding anticoagu¬ 
lant therapy, without additional testing, has been demonstrated 
in patients with a low 17 or low/moderate 18 pretest probability for 
deep vein thrombosis and a negative D-dimer test result. D- 
dimer is a plasmin-derived fibrin degradation product that is 
highly sensitive for deep vein thrombosis and pulmonary embo¬ 
lism . 19 Elevated levels of D-dimer are observed in most patients 
with pulmonary embolism and deep vein thrombosis, but 
because the available assays have moderate specificity (30%- 
75%), they also show elevated results in patients with non- 
thrombotic disorders . 19 We postulated that assessment of pretest 
probability of pulmonary embolism also might be useful in sim¬ 
plifying the diagnosis of this condition. 

The objectives of this article are 2-fold: ( 1 ) to determine 
whether, according to their clinical impression after collect¬ 
ing routine data (the clinical gestalt), experienced clinicians 
can accurately group patients into strata distinguished by an 
increasing probability of pulmonary embolism; and ( 2 ) to 
determine whether clinical prediction rules are useful in 
determining the pretest probability for pulmonary embo¬ 
lism. For the first objective, the examiner estimates the prob¬ 
ability of pulmonary embolism according to his or her 
clinical gestalt. Each examiner values the information differ¬ 
ently in quantifying an overall impression. For the second 
objective, clinical prediction rules rely on an explict prespeci¬ 
fied list of data items, each of which is assigned a score. 

METHODS 

Data Sources 

We searched the MEDLINE electronic database for English- 
language articles published between 1966 and March 2003, 
using the following Medical Subject Headings: “pulmonary 
embolism,” “prospective studies,” “EXP” (explode) “sensitivity 
and specificity,” “EXP probability” and “EXP models,” and 
“statistical.” We identified studies in which clinical assessment 
of patients with suspected pulmonary embolism was per¬ 
formed routinely. The reference lists of identified articles also 
were examined for additional studies missed by the MEDLINE 
search. 

Study Selection and Data Extraction 

Three independent reviewers (S.D.C., J.W.E., J.A.) identified 
potentially eligible articles, and a senior reviewer (J.S.G.) 













CHAPTER 43 Pulmonary Embolus 


resolved disagreements. To be eligible, studies had to include 
the following: ( 1 ) an estimate of the pretest probability of pul¬ 
monary embolism, using the clinical gestalt or clinical predic¬ 
tion rule; ( 2 ) performance of the clinical assessment blind to 
the results of diagnostic testing; and (3) comparison of these 
assessments with validated methods of confirming or refuting 
the diagnosis of pulmonary embolism (Box 43-1). 20 ' 24 Addi¬ 
tional eligibility criteria were applied to studies in which a clin¬ 
ical prediction rule was being derived . 25 These studies had to 
systematically collect all relevant clinical data from consecutive 
patients and have a sufficient number of patients with con¬ 
firmed pulmonary embolism (n > 50) to ensure accuracy of 
the derived rule. For each eligible study, where possible, the 
pretest probability categories, corresponding disease preva¬ 
lences, and likelihood ratios (LRs) (and corresponding 95% 
confidence intervals [CIs]) are summarized. 

The clinical gestalt must have been determined according 
to information available from the patient’s medical history 
and findings from physical examination and routine investi¬ 
gations (eg, chest radiograph, ECG, and arterial blood gas 
analysis) without predetermined elements or a standardized 
score, and most important, it must have been assessed before 
other diagnostic testing. A clinical prediction rule used a 
mathematically derived formula that combined the individ¬ 
ual contribution of each component of the medical history, 
physical examination findings, and routine laboratory results 
before diagnostic testing. 

Data Analysis 

Likelihood ratios and their 95% CIs were calculated with Met- 
stat (version l ) 26 and Cl Analysis (version 1.1 ). 27 Summary LRs 
were derived with random-effects measures that provide con¬ 
servative CIs around the estimates . 28,29 Decisions to include or 
exclude studies were made before the analysis according to the 
reported methods, rather than their actual results. We deter¬ 
mined the summary LRs to get a general sense of whether 
structured models performed as well as the clinical gestalt. 
Furthermore, we pooled data only from studies that derived a 
structured model and specifically did not include data from 
subsequent validation studies, because these studies varied 
substantially in their study design (retrospective assessment 
and concomitant use of D-dimer) from the derivative studies. 

RESULTS 

Our search yielded a total of 1709 articles, and after scanning 
the abstracts and titles, we selected 443 abstracts for detailed 
review. Of these, 30 articles were selected for complete review 
and 16 were included in the final analysis. These studies 
involved a total of 8306 patients. 

Clinical Gestalt 

In the PIOPED study, physicians used their clinical gestalt to 
estimate the probability of pulmonary embolism according 
to patient medical history and physical examination findings, 
together with the results of a chest radiograph, an ECG, and 


Box 43-1 Criteria for Diagnosis and Exclusion of 
Pulmonary Embolism 

POSITIVE RESULT FOR PULMONARY EMBOLISM 

Positive pulmonary angiogram result . 20 

High-probability lung scan (> 1 segmental perfusion 
defect 21 or > 2 large [>75% of a segment] segmental per¬ 
fusion defects 5 with corresponding normal ventilation). 

Nondiagnostic lung scan with either a positive veno¬ 
gram result 22 or a compression ultrasonogram diagnostic 
for deep vein thrombosis. 

Positive lung perfusion scan 23 (single or multiple 
wedge-shaped defect with or without matching chest 
radiograph abnormalities; wedge-shaped areas of over¬ 
perfusion usually exist). 

NEGATIVE RESULT FOR PULMONARY EMBOLISM 

Normal perfusion lung scan result 23 and a normal 3- 
month follow-up result. 

Negative pulmonary angiogram result 20 and a normal 
3-month follow-up result. 

Nondiagnostic lung scan and negative venogram 
result , 22 serial leg compression ultrasonography , 14 or 
impedance plethysmography 24 and a normal 3-month 
follow-up result. 

Negative spiral computed tomographic scan result and 
negative venogram or negative serial compression ultra¬ 
sonographic result and a normal 3-month follow-up 
result. 

Negative D-dimer test result and a normal 3-month 
follow-up result, provided anticoagulants were withheld. 


an arterial blood gas analysis (Table 43-2 ). 5,23,30 ' 34 The results 
of this study showed that the prevalence of pulmonary 
embolism correlated reasonably well with the pretest proba¬ 
bility estimates of pulmonary embolism. 

The Prospective Investigative Study of Acute Pulmonary 
Embolism Diagnosis (PISA-PED) tested the accuracy of per¬ 
fusion scan alone compared with pulmonary angiography . 23 
In this study, experienced clinicians estimated the probability 
of pulmonary embolism from their clinical gestalt according 
to patient symptoms, signs, and risk factors, together with 
the results of a chest radiograph, an ECG, and an arterial 
blood gas analysis. 

Perrier et al 30 ' 32 reported the clinical gestalt from 3 separate 
studies, using a diagnostic strategy in which a ventilation/ 
perfusion lung scan, a D-dimer assay, and compression ultra¬ 
sonography followed the clinical evaluation. In the first 2 
studies , 30,31 all patients underwent a ventilation/perfusion 
scan and then were treated according to the pretest probabil¬ 
ity assessment, D-dimer assay result, and compression ultra¬ 
sonographic finding. In the third study , 32 patients were 
assessed initially with a highly sensitive (but nonspecific) 
enzyme-linked immunosorbent assay D-dimer laboratory 
analysis. The results of these studies are consistent with those 
reported in the PISA-PED 23 and PIOPED 5 studies. 
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Table 43-2 Accuracy of Pretest Probability Assessment for Pulmonary Embolism With Clinical Gestalt 


Source, y 

No. of 
Patients 

Prevalence of 
Pulmonary Embolism, % 

Category 

Probability 
Estimate, % 

No. of 
Patients 

Actual 

Probability, % 

LR (95% Cl) a 

PIOPED, 5 1990 

887 

28 

Low 

0-19 

228 

9 

0.26(0.17-0.4) 




Moderate 

20-79 

569 

30 

1.1 (0.96-1.2) 




High 

80-100 

90 

68 

5.3 (3.5-8.0) 

Miniati et al, 23 1996 

783 

44 

Unlikely 

10 

349 

8 

0.13(0.09-0.18) 




Possible 

50 

179 

47 

1.1 (0.86-1.4) 




Very likely 

90 

225 

91 

12(8.1-18) 

Perrier et al, 30 ' 32 1996, 

985 

27 

Low 

<20 

368 

9 

0.21 (0.15-0.29) 

1997,1999 



Moderate 

21-79 

523 

33 

1.1 (1.0-1.3) 




High 

>80 

94 

66 

4.5 (3.0-6.7) 

Sanson et al, 33 2000 

413 

31 

Low 

0-19 

58 

19 

0.53 (0.28-0.99) 




Moderate 

20-80 

278 

29 

0.92(0.79-1.1) 




High 

>80 

77 

46 

1.9(1.3-2.8) 

Musset et al, 34 2002 

1041 

34 

Low 

0-19 

231 

12 

0.26(0.18-0.38) 

(ESSEP) 



Moderate 

20-79 

525 

26 

0.67 (0.58-0.78) 




High 

80-100 

285 

68 

4.0 (3.3-5.0) 


Abbreviations: Cl, confidence interval; ESSEP, Evaluation du Scanner Spirale dans I’Embolie Pulmonaire; LR, likelihood ratio; PIOPED, Prospective Investigation of Pulmonary 
Embolism Diagnosis. 

“Summary data (LR [95% Cl]) for empirical pretest probability assessments are the following: low, 0.25 (0.14-0.45); moderate, 0.92 (0.71-1.2); and high, 4.7 (2.3-9.7). These 
summary data exclude results from the studies by Perrier et al 30 ' 32 because the pretest probability was used to manage subgroups of patients. 


Sanson et al 33 conducted a study in 6 Dutch teaching hos¬ 
pitals. The clinical gestalt was quantified into the pretest 
probability for pulmonary embolism, and patients under¬ 
went ventilation/perfusion lung scanning followed by angi¬ 
ography if the lung scan finding was nondiagnostic. The 
estimate of the pretest probability was performed by the 
attending physician on a visual analog scale; however, the 
results of chest radiographs, ECGs, and arterial blood gas 
analysis were not always available when the pretest probabil¬ 
ity was documented. In this study, assessment of pretest 
probability was less predictive than other studies of the clini¬ 
cal gestalt. 

The Evaluation du Scanner Spirale dans l’Embolie Pulmo¬ 
naire study group 34 assessed the accuracy of contrast spiral 
computed tomography (CT) of the chest for pulmonary 
embolism in 1041 patients. Using simple prespecified guide¬ 
lines and empirical assessment based on patient medical his¬ 
tory, physical examination findings, and results of routine 
investigations, clinicians stratified patients into low-, moder¬ 
ate-, or high-pretest-probability groups. The presence or 
absence of pulmonary embolism largely was based on the 
combined results of spiral CT and routine bilateral compres¬ 
sion ultrasonography of the legs. If the clinical suspicion was 
high and the test results were negative, or if test results were 
inconclusive, further assessment with lung scanning and pul¬ 
monary angiography was performed. The study demon¬ 
strated reasonable discriminative ability among the 3 pretest 
groups. 

When interpreted together, the studies show that, when 
experienced clinicians use clinical gestalt, the prevalence of pul¬ 
monary embolism increases with increasing pretest probability. 


The PIOPED and PISA-PED studies demonstrate the influence 
that clinical gestalt has on the interpretation of results of subse¬ 
quent tests. In the PISA-PED study, a positive scan result for 
pulmonary embolism (single or multiple perfusion defects 
with or without matching chest radiograph abnormalities), 
together with a possible or likely clinical pretest probability, 
was associated with pulmonary embolism in 92% and 99% 
of patients, respectively. 34 On the other hand (similar to the 
PIOPED study results), when patients had an unlikely (low) 
clinical pretest probability but a positive finding on perfusion 
scan, pulmonary embolism was diagnosed in only 50% to 60% 
of individuals. 

The findings in the study by Sanson et al 33 suggest that the 
clinical gestalt is not particularly discriminating. However, 
the study still showed increasing prevalence of pulmonary 
embolism according to pretest probability. 

Clinical Prediction Rules 

The PISA-PED study group analyzed clinical data from their 
accuracy study (Table 43-2) 23 to derive a structured clinical 
rule. 35 Clinical variables were divided into 3 categories: (1) signs 
and symptoms; (2) results of routine tests (chest radiograph, 
ECG, and arterial blood gas analysis); and (3) evidence of an 
obvious alternative diagnosis. 

Wells et al 14 initially developed a 40-variable clinical rule 
and subsequently refined the rule after a limited pilot study. 
This rule (extended Wells) was used in a large multicenter 
study in which 1239 patients were enrolled and assigned a 
clinical probability of pulmonary embolism after taking a 
patient medical history, performing a physical examination, 
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and assessing chest radiography, arterial blood gas analysis, 
and ECG findings. A checklist of specific symptoms and 
signs was compiled to help assign the pretest probability. 
Patients were assessed for type of symptoms (“typical,” 
“atypical,” or “suggestive” of severe pulmonary embolism), 
the presence or absence of risk factors, and the presence or 
absence of an alternative diagnosis as likely as or more likely 
than pulmonary embolism to account for the patient’s 
symptoms. 

The corresponding prevalence and LRs for pulmonary 
embolism in each of the 3 pretest probability categories are 
listed in Table 43-3. 14 ’ 35 ' 38 The utility of pretest probability 
assessment in combination with lung scanning again was 
highlighted. Only 8 of 27 (30%) patients with a low pretest 
probability and a high-probability lung scan result had 
angiographically proven pulmonary embolism . 14 

Clinical data collected on the 1239 patients by Wells et al 39 
also were used to derive a simplified clinical rule. With a step¬ 
wise logistic regression model, 7 key variables were identified 
and selected for inclusion in the final rule. Cut points were 
identified to classify patients as low (< 2 ), moderate ( 2 - 6 ), or 
high (>6) probability for pulmonary embolism (Table 43-4). 39 
With this simplified rule, only 3% (LR, 0.17; 95% Cl, 0.11-0.27) 
of patients with a low pretest probability had pulmonary 
embolism vs 63% (LR, 8 . 6 ; 95% Cl, 5.7-13) of those with a 
high pretest probability. 

Wicki et al 36 pooled clinical data obtained from the patient 
medical history and physical examination, together with 
results of the chest radiograph, ECG, and arterial blood gas 
analysis collected during the 3 studies, involving 986 consec¬ 
utive patients. A 7-variable rule was derived by logistic 


regression and statistically cross-validated (Table 43-5). A 
score based on a weighted sum of variables present, was used 
to estimate the pretest probability of pulmonary embolism. 
Patients with scores of less than 5 had low pretest probability 
of pulmonary embolism, of 5 to 8 had moderate pretest 
probability, and of greater than 8 had high pretest probabil¬ 
ity. The prevalence of pulmonary embolism correlated well 
with pretest probability. 

A large emergency department-based study involving 7 US 
centers systematically assessed 934 patients with suspected 
pulmonary embolism and derived a 6 -variable model from 
this database (Figure 43-1). 37 This model used 2 screening 
variables to assess all patients’ age and shock index (heart rate 
divided by systolic blood pressure). Patients younger than 50 
years and with a shock index less than 1 are deemed “non-high 
risk”; the remaining patients are then further assessed with 4 
variables. The model classified 79% of patients as non-high 
risk patients in whom the prevalence of pulmonary embolism 
was 13%, whereas the prevalence in the high-risk group (21% 
of patients) was 42%. Two medical students subsequently were 
employed to assess 117 patients presenting to one of the par¬ 
ticipating centers, and they demonstrated a high degree of 
interobserver agreement (weighted K, 0.83). 37 

The PISA-PED investigators have reanalyzed data from 
their initial study and included data on a further 350 
patients; the latter were assessed and treated as in the first 
study . 38 Using appropriate statistical techniques, they derived 
and cross-validated a 15-variable model (Table 43-6). Unlike 
other structured models, the authors calculated and dis¬ 
played the actual pretest probability for individual patients 
rather than the ordinal descriptors of low, moderate, and 


Table 43-3 Accuracy of Clinical Prediction Rules for Assessing Pretest Probability of Pulmonary Embolism in Derivative Studies 3 

Prevalence of Pretest 


Source, y 

No. of Patients 

Pulmonary 
Embolism, % 

Prospective 

Validation 

Probability 

Category 

Pretest 
Probability, % 

LR (95% Cl) 

Wells et al, 14 1998 

1239 

17.5 

Yes 

Low 

3 

0.17(0.12-0.25) 

(Extended) 




Moderate 

28 

1.8(1.5-2.1) 





High 

78 

17(11-27) 

Miniati et al, 35 1999 

750 

41 

Yes 

Unlikely 

6 

0.05(0.03-0.10) 

(PISA-PED) 




Possible 

46 

0.99(0.75-1.3) 





Very likely 

97 

47 (23-98) 





High 

63 

8.6 (5.7-13) 

Wicki et al, 36 2001 

986 

27 

Yes 

Low 

10 

0.31 (0.24-0.40) 

(Geneva rule) 




Moderate 

38 

1.7 (1.5-1.9) 





High 

81 

11 (6.1-21) 

Kline et al, 37 2002 

934 

19.4 

No 

Nonhigh 

13.3 

0.64 (0.56-0.73) 





High 

42.1 

3.0 (2.4-3.8) 

Miniati et al, 38 2003 

1100 

40 

No 

Low 

4 

0.07(0.04-0.11) 

(PISA-PED II) 




Moderate 

26 

0.72 (0.6-0.87) 





High 

98 

66(31-137) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; PISA-PED, Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis. 

“Summary of pretest probability (LR [95% Cl]) of structured clinical rules is as follows; low, 0.12 (0.05-0.31); moderate, 1.1 (0.76-1.6); and high, 23 (7.6-69). This summary 
excludes data from Kline et al, 37 because that study categorized patients only into low and high categories, and from Wells et al 14 because the pretest probability was used to 
guide management, which likely resulted in case-finding bias. 
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Table 43-4 The Simplified Wells Scoring System 3 

Findings 

Score 6 

Clinical signs/symptoms of deep vein thrombosis (minimum of leg 
swelling and pain with palpation of the deep veins of the leg) 

3.0 

No alternate diagnosis likely or more likely than pulmonary emboli 

3.0 

Heart rate > 100/min 

1.5 

Immobilization or surgery in last 4 wk 

1.5 

History of deep vein thrombosis or pulmonary emboli 

1.5 

Hemoptysis 

1.0 

Cancer actively treated within last 6 months 

1.0 


“Adapted from Wells et al 39 with permission. 

“Category scores are as follows: low, <2; moderate, 2-6; and high, >6. The patient's 
clinical score is calculated by the summing of the scores (weight) of the predictor 
variables that are present. 


Table 43-5 The Clinical Prediction Rule by Wicki et al 36 (Geneva Rule) 3 

Variable 

Point Score 6 

Age, y 

60-79 

1 

>80 

2 

Previous pulmonary emboli or deep vein thrombosis 

2 

Recent surgery 

3 

Pulse rate > 100/min 

1 

Paco 2 , kPa c 

<4.8 

2 

4.8-5.19 

1 

Pao 2 , kPa c 

<6.5 

4 

6.5-7.99 

3 

8-9.49 

2 

9.5-10.99 

1 

Chest radiograph appearance 

Platelike atelectasis 

1 

Elevated hemidiaphragm 

1 


“Adapted from Wicki et al. 3B 

“The pretest probability categories (clinical probability score range, prevalence of dis¬ 
ease [95% confidence interval], and percentage of patients in the pretest probability 
category) are as follows: low (0-4,10% [8%-13%], 49%); intermediate (5-8, 38% 
[34%-43%], 38%); and high (9-16,81% [69%-90%], 6%), respectively. 
c kPa/0.133 = mm Hg. Thus, Paco 2 < 4.8 kPa becomes Paco 2 < 36 mm Hg. 


high probability. Nonetheless, the probability of pulmonary 
embolism in the low, moderate, moderately high, and very 
high pretest strata shows clear discrimination among the 
groups (for ease of comparison, we have combined the mod¬ 
erate and moderately high groups). 

Validation of Derived Clinical Prediction Rules 

Two hundred fifty patients with suspected pulmonary embo¬ 
lism were assessed prospectively by the PISA-PED group . 15 In 
this study, 90% of patients were categorized correctly as hav¬ 


ing or not having pulmonary embolism, which compared 
favorably with an 88 % diagnostic accuracy in the initial 
study. 

The extended Wells model has been tested prospectively 
by Sanson et al 33 and by Kruip et al . 40 The pretest probabil¬ 
ity in the study by Sanson et al 33 was determined retro¬ 
spectively by a second physician, who used clinical 
information collected by the assessing physician; both 
physicians remained blind to the results of diagnostic test¬ 
ing for pulmonary embolism. Unfortunately, about 50% 
(212 of the 414 patients) of study patients enrolled were 
assessed. The Sanson et al 33 study assignments of low, 
moderate, and high pretest probabilities corresponded to 
rates of pulmonary embolism of 28%, 39%, and 46%, 
respectively. These results showed less discrimination 
among the subgroups than other studies. Kruip et al 40 
combined the pretest probability assessment of patients 
with the results of D-dimer analysis and withheld objec¬ 
tive testing and anticoagulant therapy in those patients 
categorized with a low pretest probability and a negative 
D-dimer result (normal level). All other patients were 
tested with the combination of compression ultrasonogra¬ 
phy of the legs followed by pulmonary angiography, if the 
ultrasonography results were negative. The model showed 
considerable discriminative ability when used by Kruip et 
al , 40 with the prevalence of pulmonary embolism ranging 
from 4% in the low pretest probability group to 28% and 
63% in the moderate- and high-pretest-probability 
groups, respectively. For the subgroup of patients with a 
low pretest probability and a negative D-dimer result, the 
3-month rate of venous thromboembolism was 0% (95% 
Cl, 0%-6%) (Table 43-7). 

The simplified Wells model also was tested by 3 
groups . 33,41,42 As with the extended Wells model, Sanson et 
al 33 used a second physician to assign patients retrospec¬ 
tively a pretest score based on the clinical data collected by 
the attending physician (Table 43-7). Although the attend¬ 
ing physician was required to specify the presence of any 
alternate diagnosis that was more likely than pulmonary 
embolism, a second physician inferred this from review¬ 
ing the medical notes when the judgment was missing. 
The lack of an alternate diagnosis is a critical limitation of 
the study, given the relative importance of this factor in 
the model. Sanson et al 33 reported that the simplified Wells 
model was less discriminating in this study than in the 
original Wells et al 14 study. Patients with a low pretest 
probability had a 28% prevalence of pulmonary embolism 
compared with 3% in the study by Wells et al , 39 and only 
38% of patients with a high pretest probability had pul¬ 
monary embolism compared with 63% in the study by 
Wells et al . 39 

At variance with these data is the subsequent prospective 
validation of the simplified clinical prediction rule by Wells 
et al 41 in 4 Canadian centers and Chagnon et al 42 in 3 centers 
in France and Switzerland. The Canadian study included 
patients assessed by one of the 43 emergency department 
physicians; patients with a low pretest probability and a 
negative D-dimer test result had no further testing per- 



































CHAPTER 43 Pulmonary Embolus 


formed but were followed up for 3 months. The model reliably 
categorized patients into low-, moderate-, and high-pretest- 
probability subgroups, with the prevalence of disease being 
1.3%, 16%, and 41%, respectively. 41 

In the study by Chagnon et al, 42 emergency department 
residents collected and recorded clinical data on 277 con¬ 
secutive patients with suspected pulmonary embolism to 
create a score. Although the final score was calculated retro¬ 
spectively, all the variables were documented clearly. Subse¬ 
quent treatment of patients was determined by the results 
of D-dimer testing. Patients with a positive D-dimer result 
were further investigated with a combination of ultrasono¬ 
graphic testing of the legs, lung scanning, and pulmonary 
angiography. 32 Consistent with the prospective validation 
by Wells et al, 41 the emergency department residents were 
able to stratify patients into low-, moderate-, and high- 
pretest-probability categories, with ascending prevalences 
of pulmonary embolism. 

The clinical model derived by Wicki et al 36 has been vali¬ 
dated prospectively by Chagnon et al. 42 Emergency depart¬ 
ment residents collected all the relevant data on consecutive 
patients with suspected pulmonary embolism and assigned 
each patient a pretest probability according to the Wicki 
model. The results of the assessment of patients using the 
Wicki model showed that patients identified as low, moder¬ 
ate, or high pretest probability for pulmonary embolism 
showed ascending prevalences of pulmonary embolism. 

Precision of the Examination and Components 
of the Clinical Prediction Rules 

To be useful, the pretest probability for pulmonary embolism 
needs to be reproducible. Put simply, when the same patient 
is assessed, 2 physicians’ clinical gestalt should yield similar 
estimates of the pretest probability. None of the individual 
studies documented interobserver variability for the clinical 
gestalt. 

Wells et al 14 documented observer variability for the pretest 
probability using the extended model (k = 0.86). Kline et al 37 
employed 2 medical students to test the observer variability 
of their rule and demonstrated excellent observer agreement 
(weighted K, 0.83). Chagnon et al 42 did not document con¬ 
cordance between 2 observers for either of the 2 models they 
tested, but they documented modest agreement between the 
Wells simplified model and the Wicki model (weighted K, 
0.43) and found that in only 2 of 277 cases was there extreme 
disagreement in the pretest probability assessment. 

D-dimer Assay 

D-dimer, a specific fibrin degradation product, is generated by 
the action of plasmin on cross-linked fibrin. 19>43 ‘ 47 D-dimer 
assay is sensitive for the presence of venous thrombosis and 
can be used to help exclude deep vein thrombosis and pulmo¬ 
nary embolism. Although several assays are available, to be 
useful, a D-dimer assay must be highly sensitive for pulmo¬ 
nary embolism so that patients with this disease are not 
missed. In addition, for the assay to be useful, the specificity 
should be high enough so that the number of false-positive 


Any degree of suspicion for pulmonary embolism 



| No 


:>--\ 

Nof high risk 

, _ , 

Figure 43-1 Decision Rule for Pulmonary Embolism 

This model uses 2 screening variables to assess all patients' age and shock 
index (HR divided by SBP). Abbreviations: COPD, chronic obstructive pulmonary 
disease; HR, heart rate; SBP, systolic blood pressure. Adapted from Kline et 
al, 37 with permission from the American College of Emergency Physicians. 



Table 43-6 Structured Clinical Model Derived by the PISA-PED Group 3 

Factor 

Regression Coefficient 

Male sex 

0.81 

Age, y 

63-72 

0.59 

>73 

0.92 

Preexisting disease 

Cardiovascular 

-0.56 

Respiratory 

-0.97 

Thrombophlebitis (ever) 

0.69 

Symptoms 

Dyspnea (sudden onset) 

1.29 

Chest pain 

0.64 

Hemoptysis 

0.89 

Temperature > 38°C 

-1.17 

Electrocardiogram signs of acute right 
ventricular overload 

1.53 

Chest radiograph findings 

Oligemia 

3.86 

Amputation of hilar artery 

3.92 

Consolidation (infarction) 

3.55 

Consolidation (no infarction) 

-1.23 

Pulmonary edema 

-2.83 


Abbreviation: PISA-PED, Prospective Investigative Study of Acute Pulmonary Embo¬ 
lism Diagnosis. 

“Adapted from Miniati et al, 38 with permission from Excerpta Medica. 
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Table 43-7 Accuracy of Clinical Prediction Rules for Pulmonary Embolism When Tested Prospectively 


Source, y 

No. of Patients 

Prevalence of 
Pulmonary 
Embolism, % 

Rule Prospectively 
Tested 

Pretest Probability 
Category 

Posttest Probability, 
% 

LR (95% Cl) 

Sanson et al, 33 2000 

237 

38 

Extended Wells 14 

Low 

28 

0.66(0.4-1.1) 





Moderate 

39 

1.1 (0.86-0.13) 





High 

46 

1.4(0.81-2.5) 

Sanson et al, 33 2000 

414 

29 

Simplified Wells 39 

Low 

28 

0.93(0.69-1.3) 





Moderate 

30 

1.0(0.88-1.2) 





High 

38 

1.4 (0.35-5.9) 

Wells et al, 41 2001 

930 

9.5 

Simplified Wells 39 

Low 

1.3 

0.13(0.06-0.26) 





Moderate 

16 

1.9(1.6-2.3) 





High 

41 

5.9 (3.7-9.3) 

Kruip et al, 40 2002 

234 

22 

Extended Wells 14 

Low 

4 

0.15(0.07-0.33) 





Moderate 

28 

1.5(1.01-2.2) 





High 

63 

5.85(3.51-9.74) 

Chagnon et al, 42 2002 

277 

26 

Simplified Wells 39 

Low 

12 

0.39 (0.26-0.58) 





Moderate 

40 

2.0(1.5-2.6) 





High 

91 

29 (3.8-223) 

Chagnon et al, 42 2002 

277 

26 

Wicki (Geneva rule) 36 

Low 

13 

0.44 (0.30-0.65) 





Moderate 

38 

1.8(1.4-2.3) 





High 

67 

5.8 (1.8-19) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


results is sufficiently low. Newer assays can be performed rap¬ 
idly, making them suitable for use in individual patients . 43 ' 47 
The D-dimer assay is complementary to the clinical pretest 
probability because pulmonary embolism can be reliably 
excluded in patients with a negative D-dimer result and a low 
pretest probability . 41 The accuracy indices of 3 currently avail¬ 
able D-dimer assay types are summarized in Table 43-8. 43 ’ 45 - 46 
Unfortunately, D-dimer assays vary in their sensitivities 
and specificities, so the posttest probability for a given 
patient with suspected pulmonary embolism will vary 
according to which D-dimer assay is used. Before clinicians 
use a particular D-dimer assay to revise their pretest proba¬ 
bility, they should be aware of the differences and interpret 
the results of the assay accordingly . 44 - 47 


Table 43-8 Estimated Accuracy Indices of 3 D-dimer Assays 

D-dimer 

% (95% Cl) 

LR (95% Cl) 

Assay 

Sensitivity 

Specificity 

Positive 

Negative 

Organon 

Teknika latex 
immunoassay 45 

96 (90-99) 

45 (40-49) 

17(1.5-1.9) 

0.09(0.04-0.11) 

Vidas Rapid 
ELISA assay 46 

90(81-96) 

45.1 (39-51) 

1.6(1.4-18) 

0.22(0.11-0.44) 

SimpliRED 

D-dimer 

assay 43 

84.8 (79-89) 

68.4(65-71) 

2.7 (2.4-3.0) 

0.22(0.16-0.3) 


Abbreviations: Cl, confidence interval; ELISA, enzyme-linked immunosorbent assay; 
LR. likelihood ratio. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 This young woman has no risk factors or signs of 
pulmonary embolism (no tachycardia, features of deep vein 
thrombosis, or hemoptysis). No clear alternate diagnosis is 
present that is at least as likely as or more likely than pulmo¬ 
nary embolism. According to the Wells simplified clinical 
prediction rule, her score would be 3, a moderate pretest 
probability for pulmonary embolism (approximately 20 %). 
Her whole-blood red blood cell agglutination D-dimer assay 
result is negative (negative LR, 0.22 ). 43 Therefore, the proba¬ 
bility of pulmonary embolism after the results of the D- 
dimer assay are obtained is about 5%. The finding from a 
perfusion scan is normal (LR for pulmonary embolism with 
a normal lung scan, 0 . 1). 48 Therefore, her posttest probability 
after the above combination of tests is 0.5%, and pulmonary 
embolism can be ruled out. 

CASE 2 This elderly patient has a high pretest probability 
for pulmonary embolism (approximately 65%) with the sim¬ 
plified Wells rule because of the combination of immobiliza¬ 
tion, tachycardia, previous deep vein thrombosis/pulmonary 
embolism, and the absence of an alternate diagnosis as likely 
as or more likely than pulmonary embolism. This combina¬ 
tion of findings results in a score of 7, which falls into the cat¬ 
egory of a high pretest probability. In combination with a 
negative whole-blood red blood cell agglutination D-dimer 
assay result (LR, 0.22 ), 43 the revised pretest probability is 
approximately 30%. A ventilation/perfusion scan is reported 
as intermediate probability (LR, 1.2) 48 ; therefore, his posttest 
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probability of pulmonary embolism is about 33% and pul¬ 
monary embolism has not been ruled out. Further testing 
with compression ultrasonography and, if the finding is nor¬ 
mal, pulmonary angiography should be considered. 


THE BOTTOM LINE 

Clinical assessment alone is insufficient to diagnose or 
rule out pulmonary embolism, although experienced cli¬ 
nicians can use clinical gestalt to assign a pretest probabil¬ 
ity of pulmonary embolism with reasonable accuracy. 
Clinical prediction rules appear to have similar accuracy 
to that of the clinical gestalt for patients in the low- and 
high-probability categories. We advocate the use of any 
one of the clinical prediction rules because they are simple 
and maintain their accuracy when used by less-experienced 
clinicians. In deciding which of the several rules to use, 
clinicians could justifiably make decisions on the scale 
that is easiest for them to use consistently. Factors that 
could affect the decision are availability of the rule in clin¬ 
ical reminder systems and the availability of the required 
clinical data. We are unable to say with confidence whether 
one structured clinical rule performs better than another. 
In outpatients with new onset or recent worsening of symp¬ 
toms within the preceding 3 days, the combination of pre¬ 
test probability assessment with the results of D-dimer 
testing improves diagnostic accuracy. Furthermore, there 
is emerging evidence that outpatients with a low pretest 
probability for pulmonary embolism can have anticoagu¬ 
lant therapy safely withheld when the results of D-dimer 
testing are negative. 41,43 
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CLINICAL SCENARIO 


A 25-year-old woman presents to the emergency depart¬ 
ment with pleuritic chest pain, having just returned home 
after a 12-hour plane flight. She is taking no medications, 
other than an oral contraceptive pill. Her clinical exami¬ 
nation reveals coryza without tachypnea, and the remain¬ 
der of the examination results are unremarkable. A 
pregnancy test result is negative and a chest radiograph 
result is normal. A D-dimer test shows a positive result. 

Original Review 

Chunilal S, Eikelboom J, Attia J, et al. Does this patient have 
pulmonary embolism? JAMA. 2003;290(21):2849-2858. 

UPDATED LITERATURE SEARCH 

We applied the same search criteria as was used in the origi¬ 
nal Rational Clinical Examination article to identify studies 
of the clinical pretest probability of pulmonary emboli. We 
ran a second search combining the terms “physical exam,” 
“medical history taking,” “sensitivity and specificity,” “observer 
variation,” diagnostic test, routine,” “decision support tech¬ 
niques,” and “pulmonary embolism.” Each search was limited 
to English-language articles published between 2002 and 
2004. The first strategy yielded a total of 160 articles; the lat¬ 
ter yielded 123 articles. Titles and abstracts were reviewed 
with the same criteria used for the original article. To find 
studies in which patients with suspected pulmonary embo¬ 
lism were enrolled in an unselected consecutive manner, par¬ 
ticipating physicians in the studies had to have been blinded 
to the results of diagnostic testing and had to estimate the 
pretest probability of pulmonary embolism. Validated algo¬ 
rithms to exclude or confirm the diagnosis of pulmonary 
embolism had to have been used. 

New Findings 

• New studies focus primarily on whether a low or moderate 
clinical probability estimate in combination with a normal 
D-dimer result rules out a pulmonary embolus. For such 
patients, the summary likelihood ratio (LR) for a pulmo¬ 
nary embolus is 0 with an upper 95% confidence interval 


(Cl) of 0.06. This combination of results effectively rules 
out a pulmonary embolus. 

• The simplified Wells criteria have good reliability. 

Details of the Update 

For this update, no new clinical prediction rules were identi¬ 
fied. Four management studies were identified with the above 
search strategy. One of these evaluated the performance of a 
logistic model that used only demographic features, symp¬ 
toms, clinical signs, and radiograph results without a D- 
dimer assay. The other 3 studies evaluated outcomes after 
management that combined the results of a clinical predic¬ 
tion rule with the D-dimer (see le 43-9). 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

In the original publication, a weighted K was available for a 
limited number of structured clinical models. In a recent small 
study of patients with suspected pulmonary embolism, 2 clini¬ 
cians assessed the patient initially with the extended Wells 
model, and then, from the data collected, each clinician was 
asked to determine the pretest probability by applying the sim¬ 
plified Wells model. The weighted K value for the extended 
Wells model was 0.54 (95% Cl, 0.28-0.80) vs 0.6 (95% Cl, 
0.34-0.85) for the simplified Wells model. In the same sub¬ 
study, there was less agreement between the extended clinical 
model and the pretest probability determined by clinical gestalt 
(weighted K, 0.23; 95% Cl, 0.05-0.42). 5 These data suggest that 
the reproducibility of the clinical assessment with a structured 
clinical prediction rule is at best moderate but not dissimilar to 
the other components of the clinical examination. 6 

A reliability study of 153 patients 7 (11% with pulmonary 
emboli, using helical computed tomography [CT]) assessed 
the simplified Wells study. The criteria had substantial agree¬ 
ment, with K less than 0.70. The criterion of an “alternative 
diagnosis that is less likely than pulmonary embolism” had a 
K of 0.58 (95% Cl, 0.44-0.72), which is still considered mod¬ 
erate agreement. The weighted K value for a low vs moderate 
vs high probability of pulmonary embolus, recalculated from 
the raw data displayed in the article, showed substantial 
agreement (0.62; 95% Cl, 0.50-0.74). The results need confir¬ 
mation in a larger sample of patients. 
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Table 43-9 Likelihood Ratios for the Pretest Probability of Pulmonary 
Embolus Derived From the Clinical Gestalt or Structured Clinical Models 

Source, y 

Number of 
Patients 
(Prevalence, 
%) 

Model 

Tested 

Pretest 

Probability 

Category 

Pretest 

Probability, 

% 

(95% Cl) 

LR 

(95% Cl) 

Perrier et 
al, 1 2004 

965 (23) 

Geneva 

(with 

High 

85 

(75-92) 

19 

(10-36) 



implicit 

override) 

Moderate 

34 

(29-39) 

1.7 

(1.5-2.0) 




Low 

7 

(5-9) 

0.23 

(0.17-0.31) 

Miniati et 
al, 2 2003 

390(41) 

PISA- 

PED 

High 

100 

(97-100) 

297 

(16-4746) 




Moder¬ 
ately High 

86 

(70-95) 

8.6 

(3.4-22) 




Intermedi¬ 

ate 

24 

(16-34) 

0.48 

(0.31-0.74) 




Low 

3 

(1-7.0) 

0.04 

(0.02-0.11) 

Ten 

Wolde et 

504 (20) 

Empiric 

81 %- 
100% 

67 

(52-81) 

8.5 

(4.6-16) 

al, 3 2004 



51 %-80% 

29 

(21-37) 

1.6 

(1.2-2.3) 




21 %-50% 

15 ( 
11-19) 

0.74 

(0.57-0.93) 




0%-20% 

8 

(5-14) 

0.37 

(0.22-0.63) 

Leclerq et 
al, 4 2003 

202 (29) 

Wells 

extended 

High 

50 

(32-68) 

2.4 

(1.3-4.5) 




Moderate 

27 

(17-39) 

0.87 

(0.56-1.4) 




Low 

25 

(17-35) 

0.79 

(0.56-1.1) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; PISA-PED, Prospective 
Investigative Study of Acute Pulmonary Embolism Diagnosis. 


Table 43-10 A Low to Moderate Clinical Probability With a Normal 
D-dimer Result Makes Pulmonary Emboli Unlikely 

Source, y 

Findings 

LR (95% Cl) 

Ten Wolde et al, 3 2004 

Clinical probability < 20% and 
normal D-dimer result 

0 (0-0.32) 

Perrier et al, 1 2004 

Moderate or low probability and 
normal D-dimer result 

0(0-0.13) 

Leclerq et al, 4 2003 

Moderate or low probability and 
normal D-dimer result 

0 (0-0.36) 

Summary 


0 (0-0.06)“ 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“Summary LR is statistically homogenous (P= .94). 


Righini et al 8 reanalyzed data from which the original 
Geneva rule was derived and also retrospectively calculated 
the pretest probability by applying the simplified Wells rule. 
The a priori hypothesis was that the discriminative ability of 
the clinical models would be lower in older patients com¬ 


pared with younger patients. There was no clinically or sta¬ 
tistically significant effect of age (younger than 50 years, 51- 
74 years, and older than 75 years) on the discriminative value 
of either model. 8 

Because of heterogeneity in the earlier studies reported in 
the original Rational Clinical Examination article, we did not 
provide summary estimates for the prediction rules or D- 
dimer results. The 3 new studies that added D-dimer to the 
established prediction rules focused primarily on the utility of 
the D-dimer to rule out pulmonary emboli. The studies 
yielded more consistent, homogenous results and provide the 
opportunity to create summary measures that are especially 
useful for understanding the role of a negative D-dimer result 
in ruling out pulmonary emboli (see ables 43-10 and ). 

CHANGES IN THE REFERENCE STANDARD 

Computed Tomography Angiography 

Despite recent advances in the visualization of pulmonary 
arteries with the advent of spiral CT, there are no well- 
designed clinical outcome studies validating its role as a 
standalone test in the treatment of unselected patients with 
suspected pulmonary embolism. Newer advances in this 
technology, with the advent of multidetector modalities, may 
improve scan acquisition times and image quality. At present, 
although spiral CT continues to improve in its diagnostic 
accuracy for pulmonary embolism, a negative spiral CT 
result by itself is still not sufficient to reliably exclude pulmo¬ 
nary embolism. 9 

D-dimer With a Rapid Enzyme-Linked 
Immunosorbent Assay Technique 

The results of a recent systematic review 10 confirm that a neg¬ 
ative D-dimer test result by itself may safely exclude pulmo¬ 
nary embolism. However, these data relate primarily to the 
enzyme-linked immunosorbent assay (ELISA) D-dimer test¬ 
ing format, which has superior sensitivity and negative LR 
compared with other D-dimer assays. Therefore, a negative 
D-dimer test result with the rapid ELISA format is as diag¬ 
nostically useful as a negative lung scan result in patients 
with suspected pulmonary embolism who present with 
recent onset of symptoms. 

RESULTS OF THE LITERATURE REVIEW 

Miniati et al 2 applied the Prospective Investigative Study of 
Acute Pulmonary Embolism Diagnosis (PISA-PED) struc¬ 
tured model to assess 390 patients with suspected pulmonary 
embolism by categorizing them as having low, intermediate, 
moderately high, or high probability of pulmonary embo¬ 
lism. Pulmonary embolism was diagnosed or excluded by 
combining the pretest probability assessment with the results 
of perfusion lung scanning. Within these probability groups, 
the prevalence of pulmonary embolism was 3%, 24%, 86%, 
and 100%, respectively. 2 These data confirm the accuracy of 
this model when used by this group of clinicians. To date, no 
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other group has tested this clinical prediction rule. Because 
the structured model contains many more variables than 
other models, clinicians cannot apply the results directly 
without the use of a handheld calculator that contains the 
variables and their regression coefficients. Thus, the results 
are most useful for identifying the findings that are indepen¬ 
dently useful for diagnosis of pulmonary emboli. 

The remaining 3 studies used the results of D-dimer testing 
combined with the clinical pretest probability assessment. In a 
study by Perrier et al, 1 consecutive patients presenting to the 
emergency department with suspected pulmonary embolism 
were evaluated with the “Geneva rule.” 11 Clinicians were 
allowed to override the pretest probability assessment if their 
clinical judgment disagreed with the prediction rule. The pre¬ 
diction rule and clinician override were done before any addi¬ 
tional tests were obtained, including the D-dimer. All patients 
had a D-dimer test performed (rapid ELISA Vidas DD; 
BioMerieux, Marcy 1’Etoile, France) and, if the assay result 
was negative, no further testing was performed, anticoagulant 
therapy was withheld, and patients were followed up for 3 
months. Patients with a positive D-dimer test result under¬ 
went a preestablished standardized sequence of tests to 
exclude or confirm the diagnosis. The Geneva score pretest 
probability score was available for only 771 patients of the 
total cohort of 965 patients; for the remaining 126 patients, 
clinicians used “implicit judgment” to assess pretest probabil¬ 
ity. Of the 771 patients who were evaluated with the Geneva 
rule, clinicians used their judgment to change the pretest 
probability in 179 (23%). The pretest probability was 
increased in 126 patients and decreased in 53 patients. Over¬ 
all, 7% (95% CI, 5%-9%) of patients with a low pretest prob¬ 
ability had objectively confirmed pulmonary embolism 
compared with 35% (95% CI, 29%-39%) in the moderate- 
and 85% (95% CI, 75%-92%) in the high-pretest-probability 
groups. Stricdy speaking, this study does not validate the 
accuracy of the Geneva rule. On the other hand, as the 
authors observe, allowing physicians to override the rule 
improves its acceptability to clinicians and makes clinical 
sense. The Geneva rule does not have a variable taking into 
account an alternative diagnosis, which might otherwise 
accommodate the “implicit override” feature. No patient with 
a moderate or low probability and a normal D-dimer result 
had a pulmonary embolus (LR, 0; 95% CI, 0-0.13). 

Leclerq et al 4 assessed 202 patients referred for clinically 
suspected pulmonary embolism. The clinical pretest proba¬ 
bility for pulmonary embolism was formally documented 
with the extended Wells model 12 ; subsequent investigations 
were based on the results of D-dimer testing (Tinaquant D- 
Dimer; Roche Diagnostics, Mannheim, Germany). Patients 
with a low or moderate pretest probability and a negative D- 
dimer test result were discharged without anticoagulant ther¬ 
apy and followed up for 3 months; none of these patients 
(0%; 95% CI, 0%-5.6%) had venous thromboembolism in 
follow-up. The remainder of patients underwent perfusion 
lung scanning, followed by bilateral compression ultrasonog¬ 
raphy of the legs if the lung scan result was nondiagnostic; 
when the ultrasonographic result was normal, pulmonary 
angiography was performed. The overall prevalence of 


Table 43-11 Likelihood Ratios of an Abnormal D-dimer Result for a 
Pulmonary Embolus 

Source, y 

LR (95% Cl) 

Leclerq et al, 4 2003 

1.9(1.6-2.2) 

Perrier et al, 1 2004 

1.7 (1.5-1.8) 

Summary 

1.7(1.6-1,8) a 


Abbreviations: Cl, confidence interval; LR, likelihood ratio, 
“Summary LR is statistically homogenous (P= .09), 


pulmonary embolism was 29%; 25% (95% CI, 17%-35%) in 
patients with a low pretest probability, 26% (95% CI, 17%- 
39%) in the moderate-pretest-probability group, and 50% 
(95% CI, 32%-68%) in the high-pretest-probability group. 
These results show less discrimination than the original 
study by Wells et al 12 but are consistent with another Dutch 
study. 13 

Finally, in a multicenter study, 3 clinical gestalt or “informed 
intuition” was used to define the pretest probability of pulmo¬ 
nary embolism. This study group had previously used the Wells 
simple clinical prediction rule 14 for assessing pretest probability 
and found it to be no more discriminatory than the pretest 
probability determined by an overall assessment of the clinical 
signs and symptoms, along with the results of basic investiga¬ 
tion. 13 A total of 631 patients were assessed by study physicians 
in 3 trial centers and were assigned to one of the 4 pretest prob¬ 
abilities (0%-20%, 21%-50%, 51%-80%, >81%); patients also 
had blood drawn for a D-dimer assessment after the clinical 
probabilities were assigned (Tinaquant D-Dimer; Roche Diag¬ 
nostics). Patients with a low-pretest probability for pulmonary 
embolism and a negative D-dimer test result were discharged 
without anticoagulant therapy and were followed up for 3 
months. Clinicians were able to reliably discriminate between 
low-, intermediate-, moderate-, and high-pretest-probability 
groups, with the prevalence of pulmonary embolism in each of 
the 4 groups being 8% (95% CI, 5%-14%), 15% (95% CI, 
11%-19%), 29% (95% CI, 21%-37%), and 67% (95% CI, 
52%-81%). These data compare favorably with studies in 
which the pretest probability was assessed with a structured 
clinical model. However, these were experienced clinicians who 
had extensive and specific training in assessing patients with 
pulmonary embolism. No patient with a low-probability clini¬ 
cal assessment (0%-20%) and a normal D-dimer result had a 
pulmonary embolus (LR, 0; 95% CI, 0-0.32). 

One of the major criticisms of 2 of the structured models 
(extended 12 and simplified 14 Wells) is the need to specify or 
weight the likelihood of an alternative diagnosis apart from 
pulmonary embolism, introducing a global assessment of the 
probability of pulmonary embolism, not unlike the gestalt 
pretest assessment. This variable has been the most problem¬ 
atic in terms of its reliability. 5 Therefore, a third model, the 
Geneva rule, now encompasses this variable, in part by 
allowing physicians to upgrade or downgrade the pretest 
probability with an implicit override. Pragmatically, this 
makes sense and may improve the acceptance of this model 
among clinicians. 
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There has been considerable controversy with respect to 
the association between risk of venous thromboembolism 
and airline travel. Well-designed case-control studies sug¬ 
gest an odds ratio of 2 for the association. 6 Studies in 
selected patients 6 ' 9 suggest the absolute risk for venous 
thromboembolism increases with duration of travel, the 
presence of thrombophilia, and use of estrogen-containing 
therapy. There are conflicting data on the true incidence of 
venous thrombosis within the traveling public, with esti¬ 
mates ranging from as low as 1.6 events per million passen¬ 
gers to as high as 10% for asymptomatic, ultrasonographically 
detected, calf vein thrombosis. 5 ' 9 

EVIDENCE FROM GUIDELINES 

Recent guidelines from the British Thoracic Society support 
assessing and formally documenting the pretest probability 
for pulmonary embolism, but they do not specifically advo¬ 
cate a structured model approach over the clinical gestalt. 15 
The guideline reiterates the importance of a establishing the 
pretest probability before reviewing the results of a ventila¬ 
tion perfusion lung scan or the results of a D-dimer test. This 
process can identify low-risk patients who do not need fur¬ 
ther testing. 


CLINICAL SCENARIO—RESOLUTION 


A 25-year-old woman presents to the emergency department 
with pleuritic chest pain, having just returned home after a 
12-hour plane flight. Your examination confirms the 
coryza and absence of tachypnea (16/min). The remain¬ 
der of the examination results, including that for the legs, 
are unremarkable. A pregnancy test result is normal, as is 
a chest radiograph. Results of an arterial blood gas analy¬ 
sis do not show hypoxia, and an electrocardiogram result 
is normal. 

This woman poses a challenge to the assessment of the 
pretest probability for pulmonary embolism, largely because 
of the uncertainty of the magnitude of the risk for venous 
thrombosis after airplane travel. According to the simplified 
Wells model, her pretest probability for pulmonary embo¬ 
lism is low (absence of tachycardia, active cancer, and signs 
of deep vein thrombosis; no history of venous thrombosis; 
and no hemoptysis). The overall clinical assessment sug¬ 
gests that the young woman is more likely to have viral 
pleurisy. A low pretest probability (3%-5%), combined 
with a positive D-dimer result (MDA D-Dimer; BioMerieux, 
Inc, Durham, North Carolina) (positive LR, 1.7 16 ), places 
her posttest probability for pulmonary embolism at 8%. A 
ventilation perfusion lung scan result is normal (LR for 
pulmonary embolism with a normal lung scan result, 0.1). 17 
Her posttest probability of pulmonary embolism is less 
than 2%. If you had chosen a pretest “low” probability of as 
much as 15%, the posttest probability would still be low, at 
2.9%, after the positive D-dimer result and normal ventila¬ 
tion-perfusion scan result. 

See next page for the “Make the Diagnosis” section. 
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CHAPTER 43 Pulmonary Embolus 


PULMONARY EMBOLUS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Venous thrombosis occurs in 1 to 2 persons per 1000 per¬ 
son-years, with approximately one-half to one-third of these 
episodes from pulmonary embolism. 18 In published studies, 
the prevalence of pulmonary embolism in patients who 
present with a clinical suspicion ranges from 9% to more 
than 30%, 19 which undoubtedly relates to a combination of 
factors, including differences in referral patterns and health 
practices among countries, as well as differences in patient 
populations. The prior probability of a pulmonary embolus 
is determined from the clinical findings. Although studies 
vary in the prevalence of disease, a useful guideline would be 
to think of “low probability” as approximately less than 15% 
and “moderate probability” as 15% to 35%. 

POPULATION FOR WHOM PULMONARY EMBOLUS 
SHOULD BE CONSIDERED 

Patients who have had recent major surgery, major trauma, 
immobility, or active malignancy are some of the highest- 
risk groups within the general population, with relative risks 
varying from 5 to 200. 20 The most common presenting 
symptoms of pulmonary embolism are new or worsening 
dyspnea, acute chest pain, and, less frequently, cough, faint¬ 
ing, or hemoptysis. Tachypnea and tachycardia, the most 
common signs of pulmonary embolism, occur frequently 
with exacerbations of chronic obstructive lung disease, con¬ 
gestive cardiac failure, and pneumonia, which highlights the 
poor specificity of these signs. 21 

DETECTING THE LIKELIHOOD OF 
PULMONARY EMBOLUS 

Use a structured model to assess the pretest probability of 
pulmonary emboli. The simplified Wells scoring system may 
be the easiest to use in clinical practice, shows good reliabil¬ 
ity, and requires no laboratory tests or radiographs (see 

Table 43-12). 

Establishing the pretest probability before, and not after, 
reviewing the results of a sensitive D-dimer test will iden¬ 
tify patients at very low risk for pulmonary emboli (see 

Table 43-13). 

When there is discordance between clinician gestalt and a 
clinical prediction rule, most experts would place the patient 


Table 43-12 Simplified Wells Scoring System 

Findings in the Simplified Wells Scoring System 

Score 8 

Clinical signs/symptoms of DVT of the leg (minimum of leg 
swelling and pain with palpation of the deep veins) 

3.0 

No alternate diagnosis that is as likely as or more likely than a 
pulmonary embolus 

3.0 

Heart rate > 100/min 

1.5 

Immobilization or surgery in the last 4 weeks 

1.5 

History of DVT or PE 

1.5 

Hemoptysis 

1.0 

Cancer actively treated in the past 6 mo 

1.0 


Abbreviations: DVT, deep vein thrombosis: PE, pulmonary embolism. 

“Category scores determined by the sum of the individual scores: low, <2; moderate, 
2-6; high, >6. Adapted from Chunilal et al. 19 


Table 43-13 The Likelihood Ratios for Pulmonary Embolus for the 
Combination of Clinical Probability Estimate With the D-dimer Result 

Clinical Probability 

D-dimer 

LR (95% Cl) 

Any probability (2 studies) 

Abnormal 

17(1.6-1.8) 

Low (<15%) to moderate (15%-35%) (3 studies) 

Normal 

0 (0-0.06) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

into the higher pretest probability group. Combining the 
pretest probability with the results of D-dimer testing 
reduces the need for further investigations in those patients 
with a low-moderate probability of pulmonary embolism 
and a negative D-dimer result because a number of manage¬ 
ment studies now confirm the safety of this approach. 

REFERENCE STANDARD 

The reference standard test for proving pulmonary emboli 
requires visualization of the embolus by arteriography or 
appropriate perfusion defections with nuclear studies. How¬ 
ever, current approaches to the diagnosis of pulmonary 
emboli now recognize that patients with a low to moderate 
pretest probability and a negative high-sensitivity D-dimer 
result can be treated without anticoagulation, effectively rul¬ 
ing out the presence of pulmonary embolism. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Pulmonary Embolus 



TITLE Ruling Out Clinically Suspected Pulmonary 
Embolism by Assessment of Clinical Probability and D- 
dimer Levels: A Management Study. 

AUTHORS Leclerq MGL, Lutisan JG, van Marwijk 
Kooy, et al. 

CITATION Thromb Haemost. 2003;89(1):97-103. 

QUESTION What are the efficiency and safety of 
excluding pulmonary embolism based on a normal D- 
dimer combined with a low or moderate clinical probabil¬ 
ity as assessed by a clinical model? 

DESIGN Prospective cohort study. 

SETTING Single hospital, The Netherlands. 

PATIENTS Two hundred two patients (inpatients and 
outpatients) with suspected pulmonary embolism were 
enrolled from August 1999 to April 2001. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Consecutive patients with suspected pulmonary embolism 
were assessed and assigned a structured pretest probability. 1 
Patients with low or moderate pretest probabilities were 
treated subsequently according to D-dimer results. If D- 
dimer result was negative, no further testing was performed, 
anticoagulant therapy was withheld, and these patients were 
followed up at 3 months. All other patients were investigated 
with an algorithm. 

Pulmonary emboli were confirmed by a high-probability 
perfusion lung scan and normal chest radiograph result, 
positive compression ultrasonography result, or pulmonary 
angiogram. Pulmonary emboli were excluded by a low or 
moderate clinical pretest probability with a normal D- 
dimer test result and a negative 3-month follow-up result, a 
normal lung scan result, or negative pulmonary angiogram 
result. 

Data collected included patient demographics, pretest 
probability assessments, and D-dimer test results, as well the 
results of objective tests. 


MAIN OUTCOME MEASURES 

The proportion of patients in whom pulmonary embo¬ 
lism was safely excluded according to a low or moderate 
pretest probability and a negative D-dimer test result was 
compared to the rate of pulmonary emboli among all 
patients. 

MAIN RESULTS 

Twenty-nine percent of patients (59/202) had confirmed pul¬ 
monary embolism. The likelihood ratios for the clinical proba¬ 
bility estimate alone ( Table 43 14), the D-dimer result alone 
( Table 4 ), and the combination of the probability estimate 
with the D-dimer result ( Table 43-16) can be calculated. 
Thirty-two percent of patients had pulmonary embolism safely 
excluded according to a low or moderate pretest probability 
and a negative D-dimer test result. In this group of patients, 
the subsequent 3-month risk of recurrent venous thromboem¬ 
bolism was 0% (95% confidence interval [Cl], 0%-6%). The 
overall prevalence of pulmonary embolism in the low-, moder¬ 
ate-, and high-pretest-probability strata was 25% (95% Cl, 


Table 43-14 Likelihood Ratio of the Clinical Probability Estimate for 
Pulmonary Emboli 

Probability, All Patients 

LR (95% Cl) 

High 

2.4 (1.3-4.5) 

Moderate 

0.87 (0.56-1.4) 

Low 

0.79 (0.56-1.1) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Table 43-15 Likelihood Ratio of the D-dimer Result for 
Pulmonary Emboli 


All Patients 

LR (95% Cl) 


D-dimer positive 

1.9 (1.6-2.2) 


D-dimer negative 

0 (0-0.29) 



Abbreviations: Cl, confidence interval: LR, likelihood ratio. 
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TITLE A Diagnostic Strategy for Pulmonary Embolism 
Based on Standardized Pretest Probability and Perfusion 
Lung Scanning: A Management Study. 

AUTHORS Miniati M, Simonetta M, Bauleo C, et al. 

CITATION Eur J Nucl Med Mol Imaging. 2003;30(11): 
1450-1456. 

QUESTION What are the efficacy and safety of making 
the diagnosis of pulmonary emboli according to a combi¬ 
nation of standardized estimates of the pretest probability 
and perfusion lung scan results? 

DESIGN Prospective cohort study. 

SETTING One Italian hospital. 

PATIENTS Four hundred twenty-five patients enrolled 
from April 2000 to September 2001. 


Table 43-16 Likelihood Ratio of the D-dimer Result Among 
Patients With a Moderate or Low Clinical Probability Estimate 
for Pulmonary Emboli 


Moderate or Low Clinical Probability Patients 


D-dimer Result 

LR (95% Cl) 

Positive 

2.0 (1.7-2.4) 

Negative 

0 (0-0.36) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


17%-35%), 26% (95% Cl, 17%-39%), and 50% (95% Cl, 
32%-68%), respectively, showing minimal discriminative 
value of the pretest probability model used. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS A well-designed cohort study with objective 
confirmation or exclusion of pulmonary embolism, as well as 
appropriate follow-up. The clinical probability was deter¬ 
mined before the D-dimer results were known. 

LIMITATIONS The focus of this study was on the ability of 
the D-dimer to rule out pulmonary emboli. In this sense, 
“rule out” should be interpreted in literal terms because the 
goal was to determine whether treatment could be withheld 
for patients with a moderate or low probability and a nega¬ 
tive D-dimer result. The sample size was too small for a 
definitive conclusion. 

This study suggests that patients with a low or moderate 
pretest probability and a negative D-dimer result can be 
treated without anticoagulant therapy or additional objective 
testing. However, clinicians must decide whether to accept the 
conclusion in light of the Cl around the LR because the upper 
limit of the Cl may not be sufficiently low (upper limit, 0.36). 
Given the 29% prevalence of pulmonary emboli, a patient 
with a moderate to low clinical probability and a normal D- 
dimer result could have a probability as high as 13%. 

Because there were so few patients with a high clinical prob¬ 
ability and normal D-dimer result, clinicians should continue 
to evaluate these patients further for pulmonary emboli and 
not assume that pulmonary emboli have been ruled out. 

Although it appears that the clinical probability alone 
works as well as the D-dimer alone, such a conclusion may 
lack validity if the patients were identified for enrollment 
according to the features in the Wells et al 1 prediction model. 

Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA 

REFERENCE FOR THE EVIDENCE 

1. Wells PS, Ginsberg JS, Anderson DR, et al. Use of a clinical model for the 
safe management of patients with suspected pulmonary embolism. Ann 
Intern Med. 1998;129(12):997-1005. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients with suspected pulmonary embolism were enrolled 
prospectively in this single-center study. All were examined 
by one of the 12 respiratory physicians who determined the 
clinical pretest probability by applying a standardized clinical 
scoring system without knowledge of whether the patient 
had a pulmonary embolus. 1 

Pulmonary embolism was confirmed or excluded on the 
basis of perfusion lung scanning or pulmonary angiography. 2 
The perfusion scan was interpreted independently of the 
clinical data. Pulmonary embolism was excluded if the perfu¬ 
sion lung was normal or near normal or pulmonary angiog¬ 
raphy result was normal; for those with a nondiagnostic 
perfusion scan result, pulmonary embolism was considered 
excluded in patients with a low clinical pretest probability. 
For all other patients, pulmonary embolism was diagnosed 
according to a high-probability perfusion scan (single or 
multiple wedge-shaped perfusion defects) or pulmonary 
angiography. 

MAIN OUTCOME MEASURES 

The proportion of patients with a definitive diagnosis 
obtained noninvasively was described. A regression model 
was derived to estimate the influence of clinical findings on 
the likelihood of a pulmonary embolus. 

MAIN RESULTS 

Pulmonary embolism was diagnosed noninvasively in 132 
(34%) patients and by angiography in an additional 28 (7%) 
and was excluded in 220 (noninvasively in 191 patients 
[49%] and by a negative pulmonary angiogram result in 39 


E43-2 










CHAPTER 43 Pulmonary Embolus 


Table 43-17 Factors in the Regression Model That Alter the 
Likelihood of Pulmonary Emboli 

Increased the Likelihood of PE 

Decreased the Likelihood of PE 

Male 

Preexisting cardiovascular disease 

Age > 63 y 

Preexisting pulmonary disease 

Thrombophlebitis (ever) 

Temperature > 38°C 

Dyspnea (sudden onset) 

Chest pain 

Hemoptysis 

ECG signs of acute right ventricular 
overload 

Radiographic findings 

Radiographic findings 

Oligemia 

Consolidation (no infarction) 

Amputation of the hilar artery 

Pulmonary edema 

Consolidation (infarction) 

Probability From Model of PE 
(Calculated From the Regression 
Model) 3 

LR (95% Cl) 

High (>90%) 

297 (16-4746) 

Moderately high (50% to <90%) 

8.6 (3.4-22) 

Intermediate (10% to <50%) 

0.48(0.31-0.74) 

Low (<10%) 

0.04(0.02-0.11) 


Abbreviations: Cl, confidence interval; ECG, electrocardiogram; LR, likelihood ratio; 
PE, pulmonary embolism. 

“Data kindly provided by Massimo Miniati, MD, PhD. 


[10%]). Only 1 patient had a pulmonary embolus not 
detected during the initial evaluation. 

The regression model had a large number of demograph¬ 
ics, risk factors, symptoms, signs, and electrocardiographic 
and radiographic findings (see able 43-17). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Strict adherence to the diagnostic criteria to 
objectively exclude or prove the diagnosis of pulmonary embo¬ 
lism, as well as a 1-year follow-up for those patients in whom the 
initial evaluation result was negative. The clinical model and the 
perfusion scans were interpreted independently. 

LIMITATIONS This is a single-center study in which the 
clinical prediction rule was applied by one of 12 highly spe¬ 
cialized observers. Therefore, the generalizability of this 
model to other centers and observers remains to be proven. 
Methodologically, there was incorporation bias in which the 
results of the model were used as part of the reference stan¬ 
dard. However, these criteria were specified in advance and 
patients who did not meet the criteria required angiography. 

Clinicians collect clinical data that, when incorporated into a 
prediction model, is useful in stratifying patients’ likelihood of a 
pulmonary embolus. The prediction model requires entry of the 
data into either a spreadsheet or handheld calculator to derive 


the probability. These results are most helpful for identifying the 
clinical findings that are independently useful. 

Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA 

REFERENCES FOR THE EVIDENCE 

1. Miniati M, Monti S, Bottai M. A structured clinical model for predicting 
the probability of pulmonary embolism. Am JMed. 2003;114(3):173-179. 

2. Miniati M, Pistolesi M, Marini C, et al. Value of perfusion lung scan in 
the diagnosis of pulmonary embolism: results of the Prospective Investi¬ 
gative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED). Am 
JRespir Crit Care Med. 1996;154(5):1387-1393. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Physicians assessed the clinical pretest probability for PE by col¬ 
lating the results of a standardized clinical scoring system. 1 The 
scoring system categorized patients as “low,” “intermediate,” or 
“high” probability. When physicians disagreed with the results of 
the prediction rule, they could “override” this by recording their 
implicit clinical judgment. The probability estimate was recorded 
with knowledge of the arterial blood gas and chest radiograph 
results but no other laboratory or radiographic study results. 

The diagnostic standard for excluding PE consisted of a neg¬ 
ative D-dimer result (enzyme-linked immunosorbent assay), a 
negative helical CT scan result, and compression ultrasonogra¬ 
phy of the legs when combined with a low or moderate clinical 
probability or a negative pulmonary angiography result and 
no subsequent recurrent venous thrombosis during a 3-month 
follow-up. Pulmonary embolism was diagnosed according to a 


TITLE Diagnosing Pulmonary Embolism in Outpa¬ 
tients With Clinical Assessment, D-dimer Measurement, 
Venous Ultrasound, and Helical Computer Tomography: 
A Multicentre Management Study. 

AUTHORS Perrier A, Roy, PM, Aujesky D, et al. 

CITATION Am J Med. 2004:116(5):291-299. 

QUESTION What is the efficiency of a diagnostic strat¬ 
egy for venous thromboembolism that combines clinical 
assessment, plasma D-dimer, lower limb venous ultra¬ 
sonography, and helical computed tomography (CT)? 

DESIGN A prospective cohort study. 

SETTING Two Swiss hospitals and 1 French hospital. 

PATIENTS Patients presenting to the emergency depart¬ 
ment with suspected pulmonary embolism (PE) were pro¬ 
spectively enrolled, using predefined criteria: acute onset of 
new or worsening shortness of breath or chest pain without 
another obvious etiology. Nine hundred sixty-five patients 
were enrolled from October 1,2000, to June 30,2002. 
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positive helical CT scan result, positive result for compression 
ultrasonography of the legs (proximal deep vein thrombosis), 
or a positive pulmonary angiogram result. 


MAIN OUTCOME MEASURES 

The proportion of patients in whom a definitive diagnosis of PE 
could be made without the need for pulmonary angiography 
and the risk of venous thromboembolism in patients who had 
anticoagulants withheld because the strategy excluded PEs. We 
calculated likelihood ratios from data provided in the tables (see 
18, 43-19, and 43-20). 


MAIN RESULTS 

Pulmonary embolism was diagnosed in 222 of 965 (23%) 
patients, with only 2.7% of patients requiring pulmonary angi¬ 
ography for a definitive diagnosis. A total of 194 (20%) patients 
did not have the standardized scoring system applied to assess 
pretest probability because of incomplete data. For these patients, 
physicians assigned an implicit pretest probability assessment. 
On the other hand, there was disagreement between the stan¬ 
dardized pretest probability assessment and physicians’ implicit 
judgment in 179 patients (23%), with 70% of these instances 
requiring upgrading of the clinical score. The likelihood ratios for 
the clinical score alone (Table 43-18), the D-dimer result alone 
(Table 43-19), and the combination of the clinical score with the 
D-dimer result (Table 43-20) can be calculated. 


Table 43-18 Likelihood Ratios for the Probability of Pulmonary 
Emboli According to a Clinical Score 


Probability, All Patients 

LR (95% Cl) 


High 

19(10-36) 


Moderate 

1.7(1.5-2.0) 


Low 

0.23(0.17-0.31) 



Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Table 43-19 Likelihood Ratio of the D-dimer Result for 

Pulmonary Emboli 

All Patients 

LR (95% Cl) 

D-dimer result positive 

1.7 (1.5-1.8) 

D-dimer result negative 

0 (0-0.08) 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Table 43-20 Likelihood Ratio of the D-dimer Result Among Patients 
With a Moderate or Low Clinical Probability Estimate for Pulmonary Emboli 


Moderate or Low Clinical 


Probability 

LR (95% Cl) 

D-dimer result positive 

1.6(1.5-17) 

D-dimer result negative 

0(0-0.13) 


Abbreviation: Cl, confidence interval; LR, likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS A prospective, multicenter, multinational 
study with a large number of patients. 

LIMITATIONS Twenty percent of potentially eligible 
patients were excluded according to a number of predefined 
criteria, though the most frequent reason was a protocol vio¬ 
lation. The large number of exclusions makes the patient 
population nonconsecutive, which is a potentially important 
limitation if the exclusions were among patients who had a 
normal D-dimer result. Nearly 40% of study patients did not 
have a standardized clinical assessment because of the 
absence of arterial blood gas results or because the standard¬ 
ized score was revised by implicit clinical judgment. Accord¬ 
ing to the presenting feature of acute chest pain or dyspnea, 
patients with deep vein thrombi confirmed by ultrasonogra¬ 
phy were assumed to have PE without further studies. 

The clinical scoring system and diagnostic algorithm used 
in this study are primarily applicable to outpatients with 
recent onset of worsening or new symptoms. The standard¬ 
ized scoring system, occasionally overridden by clinical judg¬ 
ment, was good at identifying patients most likely to have a 
PE. However, the focus of this study was on identifying 
patients without PE so that additional studies and treatment 
could be avoided. A normal D-dimer result appears better 
than the scoring system and clinical judgment. However, 
because the scoring system and clinical judgment were 
applied first to identify the eligible patients, the D-dimer 
should be applied in light of the clinical findings. An inter¬ 
mediate or low probability of PE, combined with a normal 
D-dimer result, was efficient at identifying patients with a 
low likelihood of an embolus. Given the prior probability of 
22% in this study, taking the upper end of the 95% confi¬ 
dence interval (Cl) for intermediate-low probability patients 
and a normal D-dimer result (upper 95% Cl likelihood ratio, 
0.13) yields a maximum probability of 3.5%. 

Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA 


REFERENCE FOR THE EVIDENCE 

1. Wicki J, Perneger TV, Junod A, Bounameaux H, Perrier A. Assessing the 
clinical probability of pulmonary embolism in the emergency ward with 
a simple score. Arch Intern Med. 2001;161(l):92-97. 
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TITLE Non -invasive Diagnostic Work-up of Patients 
With Clinically Suspected Pulmonary Embolism: Results 
of a Management Study. 

AUTHORS Ten Wolde M, Hagen PJ, Macgillavry MR, 
et al; Advances in New Technologies Evaluating the Local¬ 
ization of Pulmonary Embolism Study Group. 

CITATION / Throm Haemost. 2004;2(7): 1110-1117. 

QUESTION Does a diagnostic algorithm safely reduce 
the need for ventilation perfusion lung scintigraphy and 
pulmonary angiography in patients who have a low clini¬ 
cal pretest probability for pulmonary embolism and a 
negative D-dimer test result? 

DESIGN Prospective cohort study. 

SETTING Three teaching hospitals in The Netherlands. 

PATIENTS Six hundred thirty-one consecutive inpaients 
and outpatients enrolled from May 1999 to April 2001. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients were all assessed and assigned a clinical pretest proba¬ 
bility (<20%, 21%-50%, 51%-80%, and >80%), which was 
determined by the responsible physician taking into account 
the patient’s medical history, findings on physical examina¬ 
tion, and the results of routine investigations. All the clinicians 
were specifically trained to assess patients with suspected pul¬ 
monary embolism. All patients had a plasma D-dimer test 
(rapid immunoturbidimetric assay; Tinaqaunt Roche Diag¬ 
nostics, Mannheim, Germany) performed after the clinical 
probability was established. 

Pulmonary embolism was considered excluded in patients 
with a low clinical pretest probability combined with a negative 
D-dimer result, a normal ventilation perfusion lung scan result, 
or nondiagnostic lung scan with negative serial compression 
testing result of the lower limbs when these patients remained 
venous thrombosis free at a 3-month follow-up (not receiving 
anticoagulants). 

Patients were diagnosed as having pulmonary embolism 
according to a high-probability lung scan, positive pulmo¬ 
nary angiography result, or a positive result for compression 
ultrasonographic examination of the legs. 

Data collected included patient demographics, patient pre¬ 
test probability assessment, and D-dimer results, as well as 
the results of objective tests (ventilation/perfusion lung scan, 
compression ultrasonographic examination of the legs, and 
pulmonary angiography). 

MAIN OUTCOME MEASURES 

The primary safety outcome was the incidence of confirmed 
venous thrombosis in patients who had venous thrombosis 
initially excluded. 


Table 43-21 Likelihood Ratio of the D-dimer Result Combined With 
the Clinical Probability Estimate for Pulmonary Emboli 


LR (95% Cl) 


Probability > 20% or D-dimer result abnormal 

1.3 (1.2-1.3) 


Probability < 20% and D-dimer result normal 

0 (0-0.32) 



Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

MAIN RESULTS 

Of 466 patients in whom pulmonary embolism was consid¬ 
ered excluded at presentation, 1.3% (95% confidence inter¬ 
val, 0.5%-2.8%) had a subsequent venous thromboembolus. 
Among the low-pretest-probability group, 95 patients also 
had a negative D-dimer result, and none of these patients 
had confirmed recurrence during the subsequent 3 months. 
A low clinical probability and a normal D-dimer result 
appeared to rule out pulmonary emboli ( Table 43-2 ). 

Within the entire cohort, 20% of patients had confirmed 
pulmonary embolism, with the prevalence of disease increas¬ 
ing along with increasing clinical pretest probability. The 
corresponding rates of pulmonary embolism for the low-, 
intermediate-, moderate-, and high-pretest groups were sta¬ 
tistically different, at 8%, 15%, 29%, and 67%, respectively, 
confirming that these experienced clinicians using clinical 
gestalt were able to accurately categorize patients. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Prospective data collection with probability 
estimate established before the D-dimer. 

LIMITATIONS The focus of this study was on the ability of 
the D-dimer to rule out pulmonary emboli. In this sense, 
“rule out” should be interpreted literally because the goal was 
to determine whether treatment could be withheld for 
patients with a low probability (<20%) and a negative D- 
dimer result. The sample size was too small for a definitive 
conclusion. A D-dimer was obtained in 82% of patients, but 
the majority of those in whom it was not obtained (109/112) 
had a clinical probability greater than 20%. 

Clinicians who are specifically trained to identify pulmo¬ 
nary embolism can accurately identify low- and high-risk 
patients for pulmonary embolism. The data confirm the 
importance of the pretest probability in triaging patients and 
using additional tests on the higher-risk patients. As in other 
studies, the upper confidence interval for the low-probability 
patients with a normal D-dimer result may not convince some 
physicians that a pulmonary embolus has been ruled out. 

Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA 
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A 24-year-old man with a history of shoulder complaints 
presents to his primary care physician. At age 16 years, his 
shoulder was injured during karate. He recovered and did 
not notice recurrence of symptoms. At age 21 years, while 
throwing a baseball, he developed sudden sharp left 
shoulder pain, with a popping noise. He sensed that the 
arm stretched out of range. He experienced a short period 
with shoulder discomfort, followed by recovery. 

Recently, he has started playing tennis and experiences 
shoulder pain that requires cessation of play. On examina¬ 
tion, the shoulder displays no swelling or atrophy. Internal 
and external rotation is somewhat painful but not limited. 
His neck moves normally, through the full range of 
motion, without pain. In considering the differential diag¬ 
nosis, one might wonder whether the medical history sug¬ 
gests instability of the shoulder and which physical 
examination findings confirm the diagnosis. 


WHY IS THE DIAGNOSIS IMPORTANT? 


The shoulder’s wide range of motion gives great freedom of 
action because of the shallow structure of the glenoid fossa but 
lends minimal bony support for the large humeral head 
(Figure 44-1). The minimal bony support creates a delicate 
balance between muscular and ligamentous strength. 1 Each 
year, 30% to 40% of adults experience shoulder discomfort, 
causing 1% to 5% of them to visit a general practitioner. 2 ' 8 
Although about half of the primary care patients with shoul¬ 
der discomfort recover within a year, a substantial number 
experience continued discomfort or develop recurrent pain. 6,7,9 
Instability of the glenohumeral joint, frequently combined 
with tears of the labrum (the cartilage rim of the glenoid), cre¬ 
ates continued problems for some of these patients. 

Instability occurs when the shoulder’s stabilizing struc¬ 
tures provide too little control as the humerus moves on the 
glenoid. As a result, the upper arm fails to stay properly 
located in the glenoid fossa during normal motion. Disloca¬ 
tion occurs when the humeral head has no attachment to 
the glenoid fossa; thus, the articular surfaces separate com¬ 
pletely. Subluxation is a symptomatic translation of the 
humeral head without complete separation. 1,1012 The resul¬ 
tant symptoms and signs allow clinical classification accord¬ 
ing to the degree (dislocation or subluxation) and the 
direction (anterior, posterior, inferior, or multidirectional) 
of the observed defects. 1,10 ' 12 The incidence of shoulder dislo¬ 
cation is about 1.7% of the general population. 13 Scientific 
literature shows no available data on the incidence or preva¬ 
lence of subluxation. 

Treatment of instability depends on the type and severity of 
the luxation detected during the clinical examination and on 
the patient’s functional deficits. The primary option, in most 
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Figure 44-1 Anatomy of the Shoulder 


cases, is conservative treatment 1,10,11 of strengthening the mus¬ 
cles of the shoulder and increasing the coordination of the 
shoulder girdle. The alternative is surgery, a useful treatment 
if the patient has recurrent dislocation without generalized 
ligamentous laxity or multidirectional instability. 1,10,11 

Labral lesions are associated with instability, although they 
can occur without instability because of injuries or degenera¬ 
tion of the shoulder joint. 1416 Labral lesions are classified 
according to their anatomic location and type of tear. 14 A fre¬ 
quently described labral tear is the superior labrum anterior- 
posterior (SLAP) lesion. 14,15 The SLAP lesion is a tear located 
at the superior part of the labrum that runs from the anterior 
to the posterior part, with or without lesions at the attach¬ 
ment of the long head of the biceps muscle. Surgical repairs 
of labral tears require an open or arthroscopic procedure. 14,15 

Anatomy of the Shoulder 

The shoulder is suited for mobility. The motions of the 
upper arm are the result of simultaneous motions in the 
glenohumeral joint, the acromioclavicular joint, the sterno¬ 


clavicular joint, and the scapulothoracic junction. 17 Shoul¬ 
der instability and labral lesions affect the functioning of 
the glenohumeral joint. 

The glenohumeral joint is the articulation between the 
large humeral head and the small glenoid fossa of the scapula 
(Figure 44-1). The fossa is extended by the glenoid labrum (a 
cartilage rim) that increases the depth and surface area of the 
articulation. 1,14 The labrum cushions the apposition of the 
humeral head on the glenoid fossa, similar to the function of 
the menisci in the knee. A loose capsule surrounds the joint, 
strengthened by 3 thickenings called the anterior gleno¬ 
humeral ligaments. 1 

Seventeen muscles create the movement of the shoulder. 17 
The movement is a complex and subtle interaction between 
the 4 articulations and contributing muscles. Although 
knowledge of the biomechanics of the shoulder is growing, 
knowledge about the relationship with clinical diagnosis is 
still limited. An important finding related to instability is the 
functioning of the 4 muscles of the rotator cuff (infraspina¬ 
tus, supraspinatus, teres minor, and subscapularis). These 
muscles play the most important roles in stabilizing the 


578 











CHAPTER 44 Shoulder Instability 


glenohumeral joint, even when the arm is in a neutral or 
relaxed position. 17 

Mechanism of Injuries Resulting in 
Instability or Labral Tears 

Instability has 3 causes. A generally known cause of anterior 
luxation includes a sudden traumatic fall with an out¬ 
stretched arm (seen frequently in skiers) or blocked throwing 
movement of the arm. Usually, this luxation will be reduced 
in the field or the hospital emergency department. More typ¬ 
ically, primary care physicians observe a second type of 
shoulder instability, created without obvious trauma and 
attributed to chronic gradual stretching during overhead 
activities in work or sport. 10 Finally, hyperlaxity of the gleno¬ 
humeral capsule, a less common cause of instability and 
often without any trauma, 1,1012 is caused by congenital exces¬ 
sive joint laxity that allows the shoulder to slip in different 
directions (multidirectional instability). Some patients with 
hyperlaxity of the glenohumeral capsule can dislocate their 
shoulder voluntarily. 

The mechanisms that create labral tears without disloca¬ 
tion are unclear. 16 The shoulder capsule and ligaments are 
attached to the labrum; thus, strong forces on these struc¬ 
tures are also potentially harmful to the labrum. The occur¬ 
rence of labral tears has been predominantly studied in 
patients with throwing injuries. 18 In this group, tears are 
associated with strong forces of strain on the anterior cap¬ 
sule, ligaments, and labrum generated during the throwing 
motion. Labral tears are distinct from rotator cuff tears. A 
labral tear involves a tear of cartilage, whereas a rotator cuff 
tear occurs in one of the tendons of the rotator cuff muscles. 
Instability of the joint or labral tears can occur with rotator 
cuff injuries. However, rotator cuff injuries do not normally 
create dislocations or labral tears. Their symptoms might be 
different, although it is not clear from the current evidence. 

CLINICAL PRESENTATION 

The diagnosis of an acute shoulder dislocation is easy to 
establish. It is a painful condition and the patient will hold 
the arm in a fixed position (Figure 44-2). 1,1012 However, 
patients with shoulder instability without dislocation present 
in a more subtle way. Some patients may complain about a 
“dead arm” feeling. 1,10 Symptoms of pain and functional dis¬ 
ability seem to be nonspecific for the presence of instabil¬ 
ity. 1,19 Instability of the shoulder should be considered when 
patients have shoulder discomfort without clear restriction 
of motion. A history of dislocation increases the likelihood of 
recurrent instability. Instability occurs more commonly in 
young people, although traumatic dislocation also occurs in 
older patients. 1,13 

The clinical examination of the shoulder for instability is 
performed to evoke recurrence of the symptoms (provocation 
tests) or to determine laxity of the glenohumeral joint (Table 
44-1). 1,10 In a provocation test, the humeral head is placed in a 
position of imminent subluxation or dislocation, which makes 
the patient recognize the pain-provoking movement and react 



Figure 44-2 Radiograph of Shoulder Luxation 


with anticipated fear or pain (an apprehension test) (Figure 
44-3A, C). Laxity tests of the shoulder evaluate the amount of 
translation of the humeral head on the glenoid in different 
positions of the humerus in the anterior, posterior, and infe¬ 
rior directions. As opposed to apprehension tests, these laxity 
tests are not intended to evoke discomfort. 

To assess the amount of translation, rehabilitation special¬ 
ists and orthopedic surgeons use a classification system such 
as the Hawkins grading scheme. Grade 0 denotes little to no 
movement, grade 1 denotes the humeral head moves onto 
the glenoid rim, grade 2 indicates the humeral head can be 
dislocated but spontaneously relocates, and grade 3 indicates 
the humeral head does not relocate when the pressure is 
removed. 1,20 In the Hawkins scheme, grades 1 to 3 are consid¬ 
ered positive outcomes on a laxity test. 

When laxity is present in more than one direction, the 
diagnosis of multidirectional instability is considered and the 
patient should be examined for generalized ligamentous lax¬ 
ity (laxity in more joints of the body). 1,10-12 There are no uni¬ 
formly accepted clinical criteria for generalized ligamentous 
laxity. One might suspect this type of laxity when finding 
positive laxity tests in both shoulders. Other examples of 
hyperlaxity include the ability to hyperextend the elbows and 
a positive thumb-to-forearm test, whereby the patient can 
pull his or her thumb back to the point of touching the fore¬ 
arm. Typically, such patients will know that they can demon¬ 
strate their loose joints. 

Patients with labral tears present with a variety of symp¬ 
toms. 16 Snyder 14 suggested that the most common clinical 
symptoms are pain with overhead activities, deep shoulder 
pain, or painful catching, popping, or clicking. Stetson and 
Templin 21 suggested that these symptoms were not specific 
for labral tears because they mimic the presence of impinge¬ 
ment disorders, rotator cuff tears, or other shoulder prob¬ 
lems. Although an obvious clinical presentation for labral 
tears cannot be described, clinicians should consider the 
diagnosis when the shoulder pain is related to a traumatic 
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Table 44-1 Clinical Tests for Instability and Laxity 




Diagnostic 

Test 

Provocation 

Patient 

Positioning 

Arm Positioning 

Technique 

Outcome 

Provocation/Relief Tests for Instability 

Relocation 3 

Pain and apprehension 

Supine 

Abducted to 90 degrees 
and externally rotated to 

90 degrees 

Humeral head pressed posteri¬ 
orly while arm is externally 
rotated 

Relieves pain and apprehension 

Anterior 

release 3 

Pain and apprehension 

Supine 

Abducted to 90 degrees 
and externally rotated to 

90 degrees 

Same as relocation test; then 
posterior pressure is suddenly 
released 

Pain or apprehension 

Apprehension 3 

Pain and apprehension 

Sitting or 
standing 

90-Degree abduction and 
full external rotation 

Arm is externally rotated while 
pressure is applied anteriorly to 
humeral head 

Pain or apprehension 

Clunk 

Clunk or grinding 

Supine 

Full abduction 

Arm is rotated in full external 
rotation, caput humeri is pushed 
slightly in anterior direction 

Clunk or grinding 

Laxity Tests for Instability 

Load and shift 
anterior or pos¬ 
terior 

Anterior or posterior 
laxity 

Sitting, stand¬ 
ing, or supine 

Neutral position 

Humeral head is fixed by clini¬ 
cian’s hand; clinician tries to 
shift humeral head in anterior 
(or posterior) direction 

Does not evoke discomfort; degree of 
humeral head translation on the gle¬ 
noid in different positions of the 
humerus is evaluated using the 
Hawkins grading scheme 11 

Sulcus sign 

Inferior laxity 

Sitting or 
standing 

Neutral position 

Arm is pulled vertically downward 

Positive when sulcus becomes visible 
between acromion and humeral head 

Provocation/Relief Tests for Labral Tears 

Biceps load 1 

Pain 

Supine 

Arm is abducted 90 
degrees, elbow is flexed 

90 degrees 

Clinician applies flexion pres¬ 
sure as patient resists 

Positive if pain occurs 

Biceps load ll c 

Pain 

Supine 

120-Degree abduction 

Clinician applies lateral force 
as patient resists 

Positive if pain occurs 

Mimori 3 

Pain and apprehension 

Sitting or 
standing 

Arm is abducted 90 
degrees, elbow is flexed 90 
degrees, forearm is supine 

Forearm is brought from maxi¬ 
mum supination to maximum 
pronation 

Positive if pain occurs 

Zaslav 3 

Compares strength in 
internal rotation to that 
of external rotation, 
excluding impinge¬ 
ment from labral tears 

Sitting or 
standing 

Arm is in 90-degree abduc¬ 
tion and 80-degree external 
rotation, elbow is flexed 90 
degrees 

Patient resists external rota¬ 
tion force applied by the clini¬ 
cian, followed by applied 
internal rotation force 

Positive (labral tear present) when the 
patient has good strength against 
external rotation and apparent weak¬ 
ness against internal rotation 

Active com¬ 
pression 
(O'Brien) 

Pain and relief 

Sitting or 
standing 

Arm is in 90-degree for¬ 
ward flexion, 10- to 15- 
degree abduction, and full 
internal rotation 

Clinician stands in front of patient 
and arm is pushed down as 
patient resists; repeated with 
arm in external rotation 

Positive if pain elicited with first 
maneuver is reduced or eliminated 
in the second 

Compression 

rotation 

Pain or clicking 

Supine 

Arm at 90-degree abduc¬ 
tion, elbow in 90-degree 
flexion 

Axial load placed on shoulder 
while rotated and circumducted 
(note McMurray knee test) 

Positive if pain or clicking occurs 

SLAP- 

prehension 

Pain or clicking 

Sitting or 
standing 

Arm at 90-degree for¬ 
ward flexion 

Arm is rotated internally in 90- 
degree flexion of the humerus 

Positive if pain or clicking occurs 

Speed 

Pain in the anterior 
shoulder 

Sitting or 
standing 

90-Degree elevation 

Downward force applied to fore¬ 
arm, full supination of forearm, 
and elbow is fully extended 

Positive if pain occurs 

Tenderness of 
bicipital groove 

Pain 

Sitting 

Neutral 

Palpating the bicipital groove 

Positive if pain occurs 

Yergason 

Pain in the biceps 
tendon 

Sitting with 
elbow at 90 
degrees 

Neutral 

Patient supinates forearm 
against clinician’s resistance, 
who simultaneously palpates 
biceps tendon 

Positive if pain occurs 


Abbreviation: SLAP, superior labrum anterior posterior. 

“Tests shown in Figure 44-3. 

Hawkins grading scheme: grade 0 denotes little to no movement; grade 1 denotes the humeral head moves onto the glenoid rim; grade 2 indicates the humeral head can be dis¬ 
located but spontaneously relocates; grade 3 indicates the humeral head does not relocate when the pressure is removed. In the Flawkins scheme, grades 1 to 3 are seen as 
positive outcomes on a laxity test. 

“Tests shown in Figure 44-4. 
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Relocation Test 


[a] Apprehension Test 


Examiner applies 
pressure to posterior 
aspect of humerus 



Positive indication 
of instability: 
patient expresses 
apprehension and/or pain 


Examiner applies 



Anterior Release Test 

Examiner performs relocation test, 
then releases downward pressure 


Positive indication 
of instability: 
patient expresses relief 



Positive indication 
of instability: 
patient expresses 
apprehension and/or pain 


Figure 44-3 Clinical Tests to Evaluate Anterior Instability of the Shoulder 

A, Apprehension test, although of limited clinical value because of low specificity, is included as part of a sequence of tests for shoulder instability. It is conducted 
with the patient sitting or standing, with the arm placed in 90-degree abduction and 90-degree external rotation, and the elbow flexed 90 degrees. Pressure is 
applied to the posterior aspect of the humerus. B, Relocation test, performed to relieve symptoms (pain and apprehension) of instability, is conducted with the patient 
supine and the arm abducted to 90 degrees and externally rotated to 90 degrees. Downward (posterior) pressure is applied to the humeral head. C, The anterior 
release test is conducted in a similar manner as the relocation test, then the examiner’s hand is removed suddenly, releasing pressure on the humeral head. 


injury that involves substantial forces on the glenohumeral 
joint (eg, falling while skiing). 

Clinical tests for detecting labral tears evoke symptoms by 
compressing the humerus into the glenoid in an attempt to 
catch the labral fragment between the bony structures (com¬ 
pression rotation test). 22 Other eponymous tests to evoke 
symptoms by rotating the humerus passively or actively, such 
as the pain provocation test of Mimori et al, 18 are shown in 
Figure 44-4. Alternative physical examination maneuvers 
reproduce shoulder symptoms by asking the patient to resist 
the force of the clinician while the arm is held in a fixed posi¬ 
tion, such as the biceps load II test 23 shown in Figure 44-4. 

Signs and symptoms for shoulder instability must be 
recorded accurately to add appropriate diagnostic informa¬ 
tion. We reviewed the literature on the accuracy of diagnostic 
studies for shoulder instability. 

METHODS 

This review is based on the guidelines for systematic 
reviews of studies evaluating the accuracy of diagnostic 
tests 24 identified through the MEDLINE (1966-2003), 
EMBASE (1980-2001), and CINAHL (1982-2001) data¬ 
bases. To retrieve all relevant publications related to diag¬ 
nosing shoulder complaints in adults, the term “exp 


shoulder” was searched. In addition, text word searches 
were completed for “glenohumeral,” “scapula,” “clavicula,” 
“acromion,” “rotator cuff,” “supraspinatus,” “supra-spina- 
tus,” “infraspinatus,” “infra-spinatus,” “serratus anterior,” 
and “subscapularis.” Diagnostic studies were retrieved by 
exploding the phrase “sensitivity and specificity,” with addi¬ 
tional text word searches of “specificity,” “false negative,” 
“screening,” and “accuracy” based on the search strategy of 
Deville et al. 25 Bibliographies of known primary and review 
articles were examined. One reviewer (J.J.L.) screened abstracts 
of the retrieved citations on clinical tests, sensitivity and 
specificity figures, and shoulder pain. Relevant articles were 
researched and their reference lists were screened to find 
additional studies. 

Studies were screened by 2 reviewers (J.J.L., B.W.K.) and 
had to meet the following inclusion criteria: (1) description 
of clinical tests for instability or intra-articular pathology 
(IAP) of the shoulder, (2) use of a reference (gold) standard, 
(3) detailing of sensitivity and specificity, and (4) publication 
in English, Dutch, or German. Studies were excluded if the 
diagnoses included fibromyalgia or systemic disorders such 
as rheumatoid arthritis, fractures, tumors, or strokes. 

We selected studies that compared a clinical test to surgi¬ 
cal or arthroscopic findings, rather than noninvasive imag¬ 
ing tests (eg, magnetic resonance imaging, ultrasonography, 
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[a] Biceps Load Test II 




Positive indication 
of labral tear: 
patient expresses 
increased pain 



Examiner attempts 
external rotation 


Positive indication of labral tear: 
patient shows normal strength when 
resisting external rotation and apparent 
weakness when resisting internal rotation 


Patient resists 
internal rotation 


Examiner attempts 
internal rotation 


Figure 44-4 Clinical Tests for Labral Tears 

A, Biceps load test II is performed with the patient 
supine, the arm is placed in 120-degree abduction 
(90-degree abduction in biceps load test I), and the 
elbow is placed in 90-degree flexion. The patient is 
asked to resist the lateral force applied by the exam¬ 
iner. B, In the pain provocation test of Mimori, the arm 
is placed in 90-degree abduction, the elbow in 90- 
degree flexion, and the forearm in maximum supina¬ 
tion. To provoke symptoms, the examiner moves the 
forearm into maximum pronation. C, Internal rotation 
resistance strength test (test of Zaslav) is conducted 
with the patient standing or sitting, with the humerus 
in 90-degree abduction and 80-degree external rota¬ 
tion. The patient is asked to resist an external rotation 
force applied by the examiner and then to resist an 
applied internal rotation force. 


or computed tomography). Although these imaging tests 
may be useful in confirming the presence of instability or a 
labral tear, they have a sensitivity of only 60% to 90%, 
depending on the type of injury 26 and in comparison with 
surgery or arthroscopy. Approximately 10% to 20% of 
patients with a normal reading on shoulder magnetic reso¬ 
nance imaging or ultrasonography 26 may still have shoulder 
instability or labral tears. Thus, these noninvasive tests 
might ultimately prove useful as a pragmatic reference stan¬ 
dard for some physicians, although the presence of verifica¬ 


tion bias (no surgery or arthroscopy implemented when the 
noninvasive study result is normal) and possible low sensi¬ 
tivity create uncertainty when the utility of the clinical 
examination is reviewed. 

For each study, details were extracted on study population 
(setting, sampling, age, sex, and diagnosis), clinical tests, ref¬ 
erence tests, and outcome (sensitivity and specificity). When 
raw data were available, the likelihood ratios (LRs) were cal¬ 
culated for individual findings, thereby describing the 
increase in odds that the patient had shoulder instability 
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when a symptom or sign was present or the opposite effect 
when a sign or symptom was absent. 

The methodologic quality of the studies was evaluated by 2 
reviewers (A.P.V., J.J.L.) with the Quality Assessment of Diag¬ 
nostic Accuracy Studies checklist. 27 This list includes 14 ques¬ 
tions about the spectrum of patients studied, selection criteria, 
test verification, test description, blinding, uninterpretable 
results, and study withdrawals. These questions could be scored 
as positive if the item was fulfilled, negative if the item was not 
fulfilled, or unclear if the item was not described. The limita¬ 
tions of each study were described. The studies were not allo¬ 
cated into arbitrary categories of low, medium, or high quality. 

RESULTS 

Our search strategy used a broad spectrum of terms for the 
shoulder, yielding about 21000 articles. Combined with the 
search strategy of Deville et al 25 on diagnosis, this resulted in 
1449 abstracts from the 3 databases. About 130 abstracts con¬ 
tained information on shoulder disorders and diagnostic out¬ 
come measurements. However, most of the articles evaluated 
sonography vs surgery, magnetic resonance imaging vs surgery, 
or one type of magnetic resonance imaging vs another type. 

Formal reviews were conducted for 35 articles that evaluated 
clinical tests. Seventeen of these studies 16,18,19 ’ 21 ' 23,28 ' 38 met the 
selection criteria for inclusion in this review (Table 44-2). Eigh¬ 
teen studies were excluded: 11 because no information on insta¬ 
bility or IAP was presented, 39 ' 49 4 because data were missing on 
sensitivity and specificity or clinical tests, 50 ' 53 and 3 because they 
were published in French. 54 ' 56 Of the 17 studies that were 
selected, 5 enrolled patients when the clinician suspected shoul¬ 
der instability 19,33,35,37,38 and 12 enrolled patients when the clini¬ 
cian suspected labral tears or other LAP. 16,18,21 ' 23,28 ' 32,34,36 All the 
studies were conducted in orthopedic clinics. Each study evalu¬ 
ated a varying number of clinical tests but lacked data on patient 
medical history. Surgery was used as a reference test in 6 stud¬ 
ies, 19,29,30,33,35,37 and arthroscopy in n.16,18.21-23,28.31,32,34,36.38 Tbe a pp re . 
hension test, 19,38,39 relocation test, 19,38 active compression test, 21,29 
anterior slide test, 22,34 and the test of Speed 30,38 were evaluated in 
more than 1 study. Two studies reported the clinical examina¬ 
tion of the shoulder under anesthesia with the same protocol. 33,37 
These studies were not pooled because of lack of clinical homo¬ 
geneity in study populations. Although most studies had the 
same inclusion criterion for participant selection (having sur¬ 
gery or arthroscopy for shoulder complaints), the selection stan¬ 
dards for undergoing surgery or arthroscopy were unclear. 
Hence, the constitution of the population might have differed. 
In addition, different end points of the diagnoses made it 
impossible to evaluate the influence of the diagnostic threshold 
for sensitivity and specificity. 

Accuracy of Signs and Symptoms Related 
to Instability and Labral Tears 

No diagnostic studies assessed the value of history taking in 
diagnosing instability. Four provocation tests for instability are 
presented in Table 44-3. The relocation test 38 and the anterior 
release test 35 have the best properties for increasing the likeli¬ 


hood of instability (relocation test 38 : positive LR, 6.5; 95% confi¬ 
dence interval [Cl], 3.0-14; and negative LR, 0.18; 95% Cl, 0.07- 
0.45; anterior release test 35 : positive LR, 8.3; 95% Cl, 3.6-19; and 
negative LR, 0.09; 95% Cl, 0.03-0.27). The relocation test does 
not work as well in determining more subtle degrees of anterior 
instability as opposed to more obvious cases of instability, 
although we were unable to evaluate the Cl around the LRs for 
detecting less significant instability. 19 The apprehension test and 
the clunk test were both of limited value because of low specific¬ 
ity and low sensitivity, respectively. 

Establishment of instability was not confirmed or ruled out 
with the sulcus sign 38 or the load and shift anterior posterior 
laxity tests. 38 The likelihood of instability increased when laxity 
tests were performed under anesthesia (positive LR, 13; 95% 
Cl, 3.9-43) 33 ; however, these tests cannot be performed in the 
general medical practice because of the use of anesthesia. 

The possibility of detecting labral tears by arthroscopy has 
renewed interest in clinical tests for detecting affected patients. 
Thirteen studies 16,18,21 ' 23,28 ' 32,34 ' 36 have evaluated 14 clinical signs, 
and 8 of these 18,21 ' 23,28,29,32,34 allowed calculation of positive and 
negative LRs (Table 44-4). The anterior slide test, 22,34 the crank 
test, 16,21,28 and the active compression test 16,21,22,29 were promising 
when their designers evaluated them. However, the accuracy 
and LRs found by other researchers were far less hopeful. There¬ 
fore, optimism should be reserved for test results that have not 
been duplicated in subsequent studies. The biceps load I test 32 
(positive LR, 29; 95% Cl, 7.3-115), the biceps load II test 23 (posi¬ 
tive LR, 26; 95% Cl, 8.6-80), the pain provocation test of 
Mimori et al 18 (positive LR, 7.2; 95% Cl, 1.6-32), and the inter¬ 
nal rotation resistance strength test 31 (positive LR, 25; 95% Cl, 
8.1-76) need confirmation before they become widely adopted. 
Conflicting evidence was found for the test of Speed. 16,30 

Limitation of the Literature 

The results of the presented studies pose some limitations and 
should be interpreted with caution (Table 44-2). The diagnos¬ 
tic studies were all executed in specialized care; therefore, the 
optimal spectrum of disease was defined as patients visiting an 
orthopedics clinic with shoulder pain. However, in 15 
studies 16,19,21 ' 23,28,30 ' 38 patients were selected from waiting lists for 
shoulder surgery or shoulder arthroscopy. In these studies, 
spectrum bias cannot be excluded. Besides, this selection crite¬ 
rion resulted in a highly selected group of patients with severe 
shoulder disorders, which is also noticeable in the high preva¬ 
lence values (15%-100%) of instability and labral lesions. A 
high prevalence among study subjects reduces the opportunity 
to detect both false-positive and true-negative results, which 
will overestimate the sensitivity and underestimate the speci¬ 
ficity when the test is applied to patient populations with a 
lower prevalence of disease. It is likely that clinical findings in 
daily medical practice have lower sensitivity but higher speci¬ 
ficity than suggested in the available literature. 

Other limitations of the existing literature include modest 
sample sizes and methodologic problems. Twelve of the 17 stud¬ 
ies did not describe the procedure for selecting patients. 18,19, 
21-23,29,30,32,33,35,36,38 q^g q me between index and reference test was 
unknown in most studies. 18,23,28 ' 38 The details of the reference 
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Table 44-2 Study Characteristics 





Source, y 

Selection Criteria 

Total No. of 
Participants 
(% of 
Women) 

Mean 

Age, 

y 

Index Test 

Limitations' 

Reference Test Arthroscopy; Retrospective Design 

Berg and 
Ciullo, 36 1998 

Identified SLAP lesions during arthroscopy 

66 (NA) 

NA 

SLAP-prehension test 

b, d,f, g 

Reference Test Arthroscopy; Prospective Design 

Guanche and 
Jones, 16 2003 

First arthroscopy for shoulder pain, complete range 
of motion under anesthesia 

61 (19) 

38 

Active compression test; anterior apprehension test; 
crank test; relocation test; test of Speed; test of 
Yergason; tenderness in bicipital groove 

e 

Ki bier, 34 1995 

Isolated glenoid labral tear, partial-thickness rotator 
cuff pathology, Bankart lesion, capsular deficiency, or 
25-degree internal rotation deficit 

226 (33) 

NA 

Anterior slide test 

a, b, c, d, f, 
g,h 

Kim etal, 32 

1999 

Arthroscopy for unilateral recurrent anterior shoulder 
dislocation (based on physical examination, plain radio¬ 
graph, and MRI) with a Bankart lesion 

75 (15) 

25 

Biceps load test 1 

a, b, e, f 


Exclusion: multidirectional instability 





Kim etal, 23 

2001 

Arthroscopy for shoulder problems 

127(30) 

31 

Biceps load test II 

a, b, e 

Exclusion: dislocation; stiff shoulder 





Liu et al, 28 

1996 

Shoulder surgery after failure of conservative treatment 

Exclusion: traumatic dislocation; weakness of 
subscapularis 

62 (22) 

28 

Crank test 

b, d,e 

McFarland et 
al, 22 2002 

Diagnostic arthroscopy for shoulder pain 

426 (NA) b 

NA 

Compression rotation test; anterior slide test; 
active compression test 

a 

Mimori etal, 18 

Shoulder pain during throwing motions 

32 (6) 

21 

Crank test; anterior apprehension test in external 
and internal rotation 

a, b, c, f 

1999 

Exclusion: instability; indications of rotator cuff 
tears on MRI or arthrography 




Stetson and 
Templin, 21 

2002 

Diagnostic arthroscopy after failure of conservative 
treatment 

65 (31) 

46 

Crank test; active compression test 

a, b, f, h 

T’Jonck et al, 38 
2001 

Shoulder arthroscopy due to disabling shoulder pain 

Exclusion: >65 y; previous surgery of shoulder; 
interaction with complaints in elbow or neck 

71 (45) 

NA 

Active compression test; apprehension test; clunk 
test; lift-off test; load-and-shift test; posterior stress 
test; release test; relocation test; resistance test 
external rotation; test of Speed; sulcus sign 

a 

Zaslav, 31 2001 

Shoulder surgery after failure of conservative 
treatment; positive Neer overhead sign 

110(41)' 

44 

Internal rotation resistance strength test 

b 

Reference Test Surgery; Prospective Design 

Bennett, 30 1998 

Surgery for shoulder pain 

45 (31) 

NA 

Test of speed 

a, b 

Cofield et al, 33 
1993 

Surgery after referral for suspected recurrent 
instability 

55 (27) 

29 

Laxity tests under anesthesia in anterior, posterior, 
inferior, anterior-inferior and posterior-inferior direction 

a, b, e 

Gross and 
Distefano, 35 

1997 

Subluxation or gross dislocation on examination under 
anesthesia; abnormal excursion during arthroscopic 
examination; Hill-Sachs lesion or Bankart lesion 

82 (38) d 

37 

Anterior release test 

a, b, e, f 

O’Brien et al, 29 
1998 

Shoulder pain 

268 (NA) 

NA 

Active compression test 

a, b, c, d, e,f, 

g,h 

Oliashirazi et 
al, 37 1999 

Shoulder surgery for unilateral traumatic recurrent 
anterior instability 

30 (17) 

23 

Laxity tests under anesthesia in anterior, posterior, 
inferior, anterior-inferior and posterior-inferior 
direction 

a, e, f 

Speer etal, 19 
1994 

Shoulder surgery; subtle anterior instability 

100 (NA) 

NA 

Relocation test apprehension test 

a,e 

Exclusion: treatable/observable rotator cuff lesions; 
multidirectional instability 






Abbreviations: MRI, magnetic resonance imaging; NA, not available; SLAP, superior labrum anterior posterior. 

limitations pertaining to all listed studies: spectrum bias possible, patient on the list for surgery or arthroscopy, and blinding unclear; the reference test might have been interpreted 
with knowledge of the index test or vice versa. Key to limitations: (a) Selection criteria for waiting list entry not described, (b) Disease progression bias possible; time between index and 
reference test not described, (c) Partial verification bias; part of the sample did not receive the reference test, (d) Incorporation bias; results of index test are used to establish the final 
diagnosis, (e) The execution of the reference test was not described, causing problems with study replication, (f) Unclear whether same clinical data (radiography, MRI, or other diag¬ 
nostic tools) would be available in daily practice, (g) Unclear whether uninterpretable or intermediate test results were reported, (h) Unclear whether all patients entering study were 
accounted for (withdrawals). Limitations of the studies were determined with the Quality Assessment of Diagnostic Accuracy Studies standardized checklist. 27 
6 An additional 178 patients retrospectively excluded for various reasons. 

'Five patients removed for cohort according to physical findings. 

'An additional 18 patients retrospectively excluded for dual diagnoses. 
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Table 44-3 Diagnostic Accuracy of Physical Examination for Instability of the Shoulder 


Index Test and Source 

Diagnosis 

No. of Shoulders 

Sensitivity 3 

Specificity 3 

LR+ (95% Cl) 

LR- (95% Cl) 

Provocation Tests 

Apprehension test 

T’Jonck et al, 38 2001 

Instability 

72 

0.88 (23/26) 

0.50 (23/46) 

1.8(1.3-2.5) 

0.23 (0.08-0.69) 

Speer et al, 19 1994 

Subtle anterior instability 






Pain 


100 

0.54 

0.44 

b 


Apprehension 


100 

0.68 

1.0 



Relocation test 

T’Jonck et al, 38 2001 

Instability 

72 

0.85 (22/26) 

0.87 (40/46) 

6.5(3.0-14) 

0.18(0.07-0.45) 

Speer et al, 19 1994 

Subtle anterior instability 






Pain 


100 

0.30 

0.58 



Apprehension 


100 

0.57 

1.0 



Clunk test 

T’Jonck et al, 38 2001 

Instability 

72 

0.35 (9/26) 

0.98 (45/46) 

16(2.1-119) 

0.67 (0.5-0.89) 

Anterior release test 

T’Jonck et al, 38 2001 

Instability 

72 

0.85 

0.87 



Gross and Distefano, 35 1997 

Occult instability 

100 

0.92 (34/37) 

0.89 (40/45) 

8.3(3.6-19) 

0.09 (0.03-0.27) 

Laxity Tests 

Load and shift posterior test 

T’Jonck et al, 38 2001 

Instability 

72 

0 (0/26) 

1.0 (46/46) 

1.7 (0-83) 

0.99(0.93-1.1) 

Sulcus sign 

T’Jonck et al, 38 2001 

Instability 

72 

0.31 (8/26) 

0.89(41/46) 

2.8(1.0-7.7) 

0.78(0.59-1.0) 

Load and shift anterior test 

T’Jonck et al, 38 2001 

Instability 

72 

0.54 (14/26) 

0.78 (36/46) 

2.5(1.3-4.8) 

0.59 (0.38-0.92) 

Examination under anesthesia 

Cofield et al, 33 1993 

Instability 

55 

1.0 (25/25) 

0.93 (28/30)“ 

13(3.9-43) 

0.02(0-0.31) 

Oliashirazi et al, 37 1999 

Anterior instability 

60 

0.83 (25/30) 

1.0 (30/30) 

51 (3.2-80) 

0.18(0.08-0.38) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“If data of the 2 x 2 table were presented in the study, the sensitivity and specificity calculations are shown in parentheses. 

“Ellipses indicate data not available. 

“The healthy contralateral shoulders of the subjects (n = 30) were used as control. Hence, the specificity value and likelihood ratios have been presumably overestimated. 


test were missing or unclear in 9 studies. 16,19,23 ' 28,29 ' 32,33 ' 35,37 Fur¬ 
thermore, in 16 studies it was unclear whether the examiner of 
the reference test was blinded for the index test 16,18,19,21 ' 23,28 ' 36,38 ; in 1 
study it was evident that the examiner was not blinded. 37 These 
methodologic problems complicate reproduction of study 
results and may have biased the outcome. 


CLINICAL SCENARIO—RESOLUTION 


Primary care physicians may consider the diagnosis of insta¬ 
bility with or without a labral tear for this 24-year-old. The 
history of trauma at a young age and recurrent shoulder prob¬ 
lems associated with a symptom that might have represented 
an acute dislocation (pop with an excessive stretch) mean that 
the attending physician may consider clinical tests to assess for 
instability and labral tears, but diagnostic accuracy would still 
be uncertain. Because the patient might opt for surgery to 
prevent recurrent dislocation, the primary care physician 
might consult an orthopedist to confirm the diagnosis and 
optimal management strategies for this patient’s case. 


THE BOTTOM LINE 

The available evidence suggests that the relocation test and 
the anterior release test are best for establishing diagnosis of 
instability. For labral tears, the biceps loads I and II tests, the 
pain provocation test of Mimori, and the internal rotation 
resistance strength test have the best diagnostic performance 
characteristics (Figure 44-4). However, these results are 
based on single studies done in groups of selected patients 
who were evaluated by specialists. Despite the high preva¬ 
lence of shoulder disorders in the general population, we are 
uncertain whether the diagnostic value of these tests or com¬ 
binations thereof will be similar when used in primary care. 
Nonetheless, an understanding of the tests used in a specialist 
practice gives primary care physicians the opportunity to 
focus on physical examination maneuvers that might 
improve diagnostic skills. Although we recommend that cli¬ 
nicians take a careful history of the mechanism of shoulder 
injury, the role of the patient’s medical history in diagnosing 
the presence of instability or labral tears has not been stud¬ 
ied. A comparison of relevant historical characteristics of 
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Table 44-4 Diagnostic Accuracy of Physical Examination for Labral Tears 




Index Test 

Diagnosis 

No. of Shoulders Sensitivity 3 

Specificity 36 

LR+ (95% Cl) 6 

LR- (95% Cl) 6 

Anterior Apprehension 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.40 

0.87 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.30 

0.63 



Active Compression (O’Brien Test) 

Stetson and Templin, 21 2002 

Labral tears 

65 0.54(14/26) 

0.31 (12/39) 

0.8 (0.5-1.2) 

1.5 (0.8-2.8) 

O'Brien et al, 29 1998 

Labral tears 

206 1.0 (53/53) 

0.98 (150/153) 

21 (10-42) 

0.01 (0-0.16) 

O'Brien et al, 29 1998 

Acromial joint pathology 

212 1.0(55/55) 

0.96 (150/157) 

44 (16-123) 

0.01 (0-0.16) 

McFarland et al, 22 2002 

SLAP lesions 

409“ 0.47(18/38) 

0.55 (203/371) 

1.0 (0.7-1.4) 

0.96 (0.70-1.3) 

Guanche and Jones, 16 2003 

SLAP lesions 

60 0.54 

0.47 



Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.63 

0.73 



Anterior Slide 

Kibler, 34 1995 

Superior glenoid labral tear 

226 0.78 (69/88) 

0.92“ (125/138) 

8.3(4.9-14) 

0.24(0.16-0.36) 

McFarland et al, 22 2002 

SLAP lesions 

419“ 0.07(3/38) 

0.83 (62/381) 

0.5 (0.2-1.5) 

0.99 (1.1-1.2) 

Biceps Load 1 

Kim et al, 32 1999 

SLAP lesions 

74 0.83(10/12) 

0.98 (62/63) 

29(7.3-115) 

0.09(0.01-0.58) 

Biceps Load II 

Kim et al, 23 2001 

SLAP lesions 

127 0.90(35/38) 

0.96 (85/89) 

26 (8.6-80) 

0.11 (0.04-0.28) 

Compression Rotation 

McFarland et al, 22 2002 

SLAP lesions 

303“ 0.24 (7/29) 

0.76 (207/274) 

1.0 (0.5-2.0) 

1.0(0.81-2.1) 

Crank 

Liu et al, 28 1996 

Labral tears 

62 0.91 (29/32) 

0.93 (28/30) 

14(3.5-52) 

0.10(0.03-0.29) 

Stetson and Templin, 21 

2002 

Labral tears 

65 0.46(12/26) 

0.56 (22/39) 

1.1 (0.6-1.9) 

0.95(0.61-1.5) 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.40 

0.73 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.39 

0.67 



Internal Rotation Resistance Strength 

Zaslav, 31 2001 

Internal articular 
derangement 

110 0.88(23/26) 

0.96(81/84) 

25(8.1-76) 

0.12(0.04-0.35) 

Pain Provocation Test of Mimori 

Mimori et al, 18 1999 

Superior labral tears 

32 1.0 (22/22) 

0.90(9/10) 

7.2 (1.6-32) 

0.03 (0-0.47) 

Relocation 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.44 

0.87 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.36 

0.63 



SLAP-Prehension 

Berg and Ciullo, 36 1 998 

SLAP lesions 

66 0.82 (54/66) 




Tenderness of Bicipital Groove 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.44 

0.40 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.48 

0.52 



Test of Speed 

Bennett, 30 1998 

Biceps pathology (including 
labral tears) 

46 0.90(9/10) 

0.14(5/36) 

1.1 (0.8-1.3) 

0.72(0.10-5.5) 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.18 

0.87 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.09 

0.74 



Test of Yergason 

Guanche and Jones, 16 2003 

Labral tears (including SLAP) 

60 0.09 

0.93 



Guanche and Jones, 16 2003 

SLAP lesions 

60 0.12 

0.96 




Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio; SLAP, superior labrum anterior posterior. 

“If data of the 2 x 2 table were presented in the study, the sensitivity and specificity calculations are shown in parentheses. 
b Ellipses indicate data not available. 

“The authors stated in their article that patient numbers for each test were not equal because tests were published at different times (namely, the compression rotation test, 1990; 
the anterior slide test, 1995; and the active compression test, 1998). 

“The healthy contralateral shoulders of the subjects were used as control. Hence, the specificity value and LRs have been presumably overestimated. 
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patients with shoulder complaints, physical examination 
findings, and noninvasive images (eg, magnetic resonance 
imaging), along with arthroscopy or surgical results, would 
greatly enhance the knowledge base of primary care physi¬ 
cians who are first to evaluate shoulder conditions. 
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UPDATE: 


Shoulder Instability 



Prepared by Catherine P. Kaminetzky, MD, MPH 

Reviewed by David L. Simel, MD, MHS, 
and Jolanda J. Luime, PhD 


CLINICAL SCENARIO 


A 24-year-old man with shoulder pain had a shoulder 
injury when he was 16 years old. For the last 3 years, he 
experienced sudden right shoulder discomfort and felt a 
pop every time he tried to throw a baseball with excessive 
force. However, the discomfort always resolved on its own. 
He has started to play tennis, and shoulder pain is affect¬ 
ing his performance. 

Inspection and palpation of the shoulder reveals no 
abnormalities. He has no neck discomfort or limitation in 
neck range of motion. Although he has full range of exter¬ 
nal and internal rotation of the shoulders, the right shoul¬ 
der causes some discomfort throughout the arc of motion. 
You decide to assess for instability of the shoulder. 


UPDATED SUMMARY ON SHOULDER INSTABILITY 
AND LABRAL TEARS 

Original Review 

Luime JJ, Verhagen AP, Miedema HS, et al. Does this patient 
have an instability of the shoulder or a labrum lesion? JAMA. 
2004;292( 16): 1989-1999. 

UPDATED LITERATURE SEARCH 

Our literature search replicated that of the original article, 
confined to 2004 to April 2006. We identified 89 potential 
articles and reviewed the abstracts to find articles that 
included consecutive, prospectively identified patients 
whose shoulder problems were suspicious for instability or 
a labral tear and who were assessed by arthroscopy or sur¬ 
gery. No new studies describe the sensitivity and specificity 
of findings for instability or labral tear symptoms and signs. 
One study described the precision of various maneuvers for 
anterior instability. 

NEW FINDINGS 

• The recommended tests for shoulder instability, the relocation 
and anterior release tests, may also be the most reproducible. 


The reliability improves when apprehension during the 
maneuver, rather than pain, is used to judge the results as pos¬ 
itive vs negative. 

Details of the Update 

Four members of an orthopedic shoulder clinic team prospec¬ 
tively examined patients referred with shoulder symptoms and a 
medical history suggestive of instability. 1 Each patient had to be 
able to endure examinations by each member of the team, 
resulting in 13 of 25 potentially eligible patients undergoing the 
complete examinations. The final diagnoses were not reported, 
but the intraclass correlations were reported for 2 laxity tests 
(load and shift) and 4 provocation tests (apprehension, reloca¬ 
tion, augmentation, and release tests). For the laxity tests, the 
results were reported on an ordinal scale, and for the provoca¬ 
tion tests the results were considered “positive” or “negative” 
according to a response of patient apprehension or pain. The 
load and shift tests had good reproducibility for motions in the 
anterior and inferior direction but not the posterior direction. 
For provocation tests, the assessment of an apprehensive 
response to each maneuver was more reproducible than the 
assessment of a response of pain. Among the 4 tests, the reloca¬ 
tion test to assess apprehension (intraclass correlation, 0.71) and 
the release test to assess apprehension (intraclass correlation, 
0.63) were the most reproducible. 

A study by Holtby and Razmjou 2 evaluated a large number of 
patients referred for shoulder problems (n = 152), of whom 50 
patients had their disease status confirmed by arthroscopy. 2 The 
2 tests of interest were Speed test and Yergason test, both initially 
described as tests for bicipital tendonitis. The verification bias 
and the categorization of disease (any biceps tendon lesion or a 
superior labral anterior posterior lesion) prohibited assessment 
of isolated labral tears, but the positive likelihood ratio (LR+) 
and negative likelihood ratio (LR-) for Speed and Yergason tests 
had confidence intervals (CIs) that crossed 1. If the data had 
been corrected for verification bias, the likelihood ratio for a 
positive Yergason test result (LR+, 2.0; 95% Cl, 0.86-4.7) might 
have appeared more promising. 

A systematic review of the incidence and prevalence of shoul¬ 
der discomfort in the general population provides a context for 
assessing the likelihood that a patient will have shoulder instabil¬ 
ity or a labral lesion. 3 The annual incidence of shoulder discom¬ 
fort is 0.9% to 2.5%. However, shoulder discomfort does not 
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immediately resolve, so prevalence rates are much higher. At any 
given time, shoulder discomfort is present in 6.9% to 26% of the 
general population. 


EVIDENCE FROM GUIDELINES 

No governmental guidelines address the evaluation of 
patients for shoulder instability. 


IMPROVEMENTS IN THE DATA PRESENTED IN THE 
ORIGINAL PUBLICATION 

None. 

CHANGES IN THE REFERENCE STANDARD 

None. 

RESULTS OF LITERATURE REVIEW 

Precision of Tests for Instability or Labral Tears 

The sulcus sign and load and shift laxity tests have similar 
reproducibility ( le 44 ). Assessing a patient’s apprehen¬ 
sion to maneuvers has greater reliability than assessing his or 
her pain ( ble 44-6). 


Table 44-5 Laxity Maneuvers 

Tests 

Intraclass Correlation 
Coefficient 

Sulcus sign 

0.60 

Load and shift (at 0-, 20-, and 90-degree arm positions) 

Anterior direction 

0.53-0.72 

Posterior direction 

0.42-0.68 

Inferior direction 

0.65-0.79 


Table 44-6 Provocation Maneuvers 

Response to Maneuvers Intraclass Correlation Coefficient 

Apprehensive Response 

Apprehension test 

0.47 

Relocation 

0.71 

Augmentation 

0.48 

Release 

0.63 

Pain Response 

Apprehension test 

0.31 

Relocation 

0.31 

Augmentation 

0.09 

Release 

0.31 

Pain or Apprehensive Response 

Apprehension test 

0.44 

Relocation 

0.44 

Augmentation 

0.33 


CLINICAL SCENARIO—RESOLUTION 


Shoulder instability, with or without a labral tear, is a 
diagnostic consideration for this patient with a history of 
a shoulder injury. The popping sensation is suggestive of 
instability, but the physical examination maneuvers are 
more important. The apprehension maneuver should be 
performed, followed by the relocation test and anterior 
release tests. The assessment of an apprehensive response 
to the relocation and anterior release tests is the most reli¬ 
able provocation test. A positive response increases the 
likelihood of instability approximately 6 to 8 times, 
whereas negative responses decrease the likelihood by 
approximately 0.1 to 0.20 times. Labral tears are assessed 
through the biceps load tests I and II. These tests differ 
only by the position of the arm (abduction at 90 degrees 
for biceps load I and at 120 degrees for biceps load II). An 
increase in pain on the biceps load tests increases the like¬ 
lihood of a labral tear by 26 to 29 times, whereas the lack 
of increased pain decreases the likelihood 0.09 to 0.11 
times. 


REFERENCES FOR THE UPDATE 
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SHOULDER INSTABILITY— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

There are no adequate data for assessing the prevalence of these 
conditions among patients with shoulder discomfort because 
the existing data come only from patients undergoing surgery 
or arthroscopy. The incidence of shoulder discomfort is 0.9% 
to 2.5%. However, because shoulder pain can be chronic, the 
prevalence at a single point in time is 6.9% to 26%. 

POPULATION FOR WHOM SHOULDER INSTABILITY OR 
LABRAL TEARS SHOULD BE CONSIDERED 

Patients with shoulder pain should be screened for shoulder 
instability and labral tears. The annual incidence of shoulder 
dislocation in the general population may be as high as 
1.7%. There are no data for the incidence or prevalence of 
labral tears. 

DETECTING THE LIKELIHOOD OF SHOULDER 
INSTABILITY OR A LABRAL TEAR 

The anterior release and relocation tests have the best mea¬ 
surement properties for shoulder instability (Table 44-7 and 
Figure 44-3). The assessment of apprehension will be more 
reliable than the assessment of pain for these maneuvers. 
The biceps load tests should be performed to assess for 
labral tears (Table 44-7 and Figure 44-3). 


Table 44-7 Likelihood Ratios for Tests of Shoulder Instability 

or a Labral Tear 3 



Test 

LR+ (95% Cl) 

LR- (95% Cl) 

Shoulder Instability 

Anterior release test 

8.3 (3.6-19) 

0.09 (0.03-0.27) 

Relocation test 

6.5(3.0-14) 

0.18(0.07-0.45) 

Labral Tear 

Biceps load 1 

29(7.3-115) 

0.09 (0.01-0.58) 

Biceps load II 

26 (8.6-80) 

0.11 (0.04-0.28) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

These data come from examinations done by orthopedists and not generalist physicians. 


REFERENCE STANDARD TESTS 

Arthroscopy or surgery. 
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CLINICAL SCENARIO 


CHAPTER 


Does This Patient Have 

Sinusitis? 

John W. Williams, Jr, MD, MHS 
David L. Simel, MD, MHS 


A patient presents to your office with a “bad cold.” Her 
symptoms began 5 days ago, when a runny nose, a 
scratchy throat, generalized malaise, and a nonproductive 
cough developed. Her symptoms are gradually improving 
with an over-the-counter cough medicine, but during the 
past 24 hours a “sinus headache” has developed. The 
patient is concerned that she may have “sinus.” It is the 
middle of cold and flu season, and this is the fifth patient 
you have treated today who has upper respiratory tract 
symptoms. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


The patient’s story is familiar to primary care clinicians. 
Among the most frequent diagnoses made by primary care 
practitioners are nasal problems such as allergic and infec¬ 
tious rhinitis, vasomotor rhinitis, and bacterial sinusitis. 1 
Given the constant assault of allergens, environmental pol¬ 
lutants, respiratory viruses, and rapid temperature changes, 
it is not surprising that nasal complaints are so common. 
However, not all “sinus” is sinusitis. Sinusitis can be defined 
simply as inflammation of one or more paranasal sinuses but 
usually refers to infection of the sinuses. In recent years, 
many new medications have become available that allow 
effective medical treatment of sinus problems so that it is 
important to diagnose nasal complaints accurately to deliver 
appropriate treatment. 2 When this can be accomplished by 
the clinical examination, it obviates the need for more expen¬ 
sive testing such as radiography. 

The list of differential diagnoses for patients with nasal 
congestion or discharge is long (Table 45-1), but a handful of 
conditions encompass the majority of cases. 3 These condi¬ 
tions can be divided into those causing inflammation of the 
nose (rhinitis) and those causing inflammation of the sinuses 
(sinusitis). Rhinitis is most frequently due to viral infection, 
allergens (seasonal or perennial), or vasomotor instability 
(eg, caused by extreme temperature change or excessive use 
of vasoconstrictive medications). When these conditions are 
severe, the sinus ostia may become blocked and the sinuses 
infected secondarily. However, the implications of diagnos¬ 
ing rhinitis are different from diagnosing sinusitis. Rhinitis 
may respond to antihistamines, nasal decongestants, nasal 
steroids, or cromolyn sodium, but randomized trials have 
shown that sinusitis requires antibiotics for rapid resolu¬ 
tion. 4 ' 5 Sinusitis also occurs as an occult illness that may be 
associated with asthmatic exacerbations or chronic head¬ 
ache. This overview will focus on the medical history and 
physical examination findings that distinguish bacterial 
sinusitis from rhinitis and other conditions. 
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Table 45-1 Differential Diagnosis of Nasal Congestion/Rhinorrhea 

Allergic 

Seasonal allergic rhinitis (pollens) 2 
Perennial allergic rhinitis (dusts, molds) 2 
Vasomotor 

Idiopathic (vasomotor rhinitis) 2 

Abuse of nose drops (rhinitis medicamentosa) 2 

Drugs (reserpine, guanethidine, prazosin, cocaine abuse) 

Psychological stimulation (anger, sexual arousal) 

Mechanical 

Polyps 

Tumor 

Deviated septum 

Crusting (as in atrophic rhinitis) 

Hypertrophied turbinates (chronic vasomotor rhinitis) 

Foreign body 

Central nervous system fluid leak 
Chronic inflammatory 
Sarcoidosis 

Wegener granulomatosis 
Midline granuloma 
Infectious 

Acute viral infection 2 

Acute or chronic bacterial infection of paranasal sinuses 2 
Atrophic rhinitis (secondary infection) 

Hormonal 

Pregnancy 

Hypothyroidism 

“Most common causes of nasal symptoms. 
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Figure 45-1 Sagittal View of Paranasal Sinuses 


SINUSITIS REQUIRES ANTIBIOTICS FOR RAPID CURE 

Reference Standard for Diagnosing Sinusitis 

The reference (or gold) standard for diagnosing infectious 
sinusitis is sinus aspiration and culture. Its use is particularly 
appropriate for guiding antibiotic choice in patients with 
complicated or refractory sinusitis. However, in general prac¬ 
tice, sinus radiographs are readily obtained and can be con¬ 
sidered a pragmatic reference standard. A 4-view sinus series 
is highly concordant with a single Waters view, 6 ' 7 and when it 
reveals sinus opacity, an air-fluid level, or 6 mm or more of 
mucosal thickening, a 4-view sinus series is 72% to 96% as 
accurate for maxillary sinusitis as aspiration and culture 
respectively. 8,9 The chief limitations of sinus radiographs are 
poor visualization of the ethmoid air cells and difficulty dis¬ 
tinguishing between infection, tumor, and polyp in the com¬ 
pletely opacified sinus. Other potentially useful diagnostic 
tests are ultrasonography and computed tomography. Ultra¬ 
sonography is nonionizing but correlates only moderately 
well with sinus radiographs or sinus aspiration. 1012 Com¬ 
puted tomography of the sinuses is superior to sinus radiog¬ 
raphy for visualizing the ethmoid air cells, for evaluating 
opacified sinuses or mucoceles, and for differentiating the 
bony changes of chronic inflammation from osteomyelitis. 13 
Sinus computed tomography may become the diagnostic test 
of choice but is not as readily available as radiographs and 
has not been evaluated against sinus puncture. This caveat is 
important because computed tomography may be highly 
sensitive, yet lack specificity. 14 

Normal Anatomy and Pathophysiology of Sinusitis 

The nose humidifies, warms, and filters inspired air as it passes 
through the nasal vestibule and over the nasal turbinates. 15 The 
nasal turbinates promote turbulent air flow that causes partic¬ 
ulate matter to fall on the nasal mucosa, where it is swept by 
ciliated pseudostratified columnar cells to the nasopharynx. 
Respiratory epithelium also lines the paranasal sinuses and 
creates drainage into the nasal cavity via the superior meatus 
(sphenoid and posterior ethmoid) and middle meatus (maxil¬ 
lary and anterior ethmoids) (Figure 45-1 ). 16 Properly function¬ 
ing ciliated cells are critical because maxillary sinus drainage is 
uphill (Figure 45-2). Patients predisposed to infectious sinusi¬ 
tis may have mucosal edema (eg, allergic rhinitis, viral rhini¬ 
tis), mechanical obstruction of the meatus (eg, polyps, 
deviated nasal septum), or impaired ciliary activity (eg, Kar- 
tagener syndrome). 3,17 Under these conditions, viruses and 
bacteria proliferate in the poorly draining sinus and provoke 
acute sinusitis. 

How to Elicit the Relevant Symptoms and Signs 

Although patients may give a simple description, such as 
“sinus trouble,” the examiner should seek a more complete 
medical history. Symptoms that may increase the likelihood 
of sinusitis include fever, malaise, cough, nasal congestion, 
maxillary toothache, purulent nasal discharge, little improve¬ 
ment with nasal decongestants, and headache or facial pain 
exacerbated by bending forward. 
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Examination of the nostrils can be performed with a short, 
wide speculum mounted on a handheld otoscope. The 
speculum should be directed posterolaterally, avoiding the 
sensitive nasal septum. The nasal mucosa should be inspected 
for color, edema, character of nasal secretions, polyps, and 
structure of the nasal septum (Figure 45-3). Purulent secre¬ 
tion from the middle meatus is reported to be highly predic¬ 
tive of maxillary sinusitis but may be difficult to see unless 
the examiner shrinks the nasal mucosa with a topical vaso¬ 
constrictive agent (eg, oxymetazoline hydrochloride) and 
uses a nasal speculum to enhance visualization. 18 Septal devi¬ 
ation or nasal polyps are important findings because they 
may contribute to nasal obstruction and promote recurrent 
sinusitis. 

Palpation for sinus tenderness should be performed over 
the maxillary and frontal sinuses (Figure 45-4). In addition, 
checking for tenderness by tapping the maxillary teeth with a 
tongue blade may be valuable because 5% to 10% of maxil¬ 
lary sinusitis is a result of dental root infection. 19 The eth¬ 
moid and sphenoid sinuses cannot be adequately evaluated 
during the routine physical examination. 

Transillumination of the maxillary sinuses may be per¬ 
formed by 2 methods. The best-studied method is performed 
by placing a Welch-Allyn-Finnoff transilluminator (Welch- 
Allyn Inc, Skaneateles Falls, New York) over the infraorbital 
rim, shielding the light source from the observer’s eyes, and 
judging light transmission between sides through the hard 
palate (Figure 45-5). The examination must be performed in 
a completely darkened room after allowing the observer’s 
vision to adapt fully to darkness. Obviously, the patient’s 
dentures should be removed. Most experts report the transil¬ 
lumination results as opaque (no light transmission), dull 
(reduced light transmission), or normal (light transmission 
typical of a normal subject). An alternative method is to 
place a light source in the patient’s mouth and have the 
patient make a tight seal around the transilluminator; the 
observer judges light transmitted through the maxillary 
sinuses. This technique has the advantage of being able to 
simultaneously compare sides but requires sterilization of the 
instrument between patient examinations. 

The frontal sinuses can be examined by placing a light 
source below the supraorbital rim, but interpretation is diffi¬ 
cult because the frontal sinuses naturally develop asymmetri¬ 
cally. This normal variation may falsely suggest sinusitis but 
is resolved by routine radiography. 

Precision of Symptoms and Signs 

A total of 111 patients with nasal complaints were examined 
by a general internist and a second examiner who was a phy¬ 
sician assistant, internal medicine resident, or attending 
internist. 20 Agreement was high between examiners for 11 of 
the 15 historical items, including headache (k, 0.78); subjec¬ 
tive fever, chills, or sweats (k, 0.71); cough (k, 0.68); colored 
nasal discharge (k, 0.68); facial pain (k, 0.65); and maxillary 
toothache (k, 0.60). (Sackett 21 gives a further explanation of 
the K statistic and the other special terms and ideas used in 
this overview.) On physical examination, agreement was high 



Figure 45-2 Coronal View of Paranasal Sinuses 



Figure 45-3 Examination of the Nose Through an Otoscope With a 
Disposable Speculum 

The middle meatus is usually not visible behind the turbinates. 
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Figure 45-4 Surface Landmarks for Palpation of Frontal Sinuses 
(Left) and Maxillary Sinuses (Right) 

Some experts recommend palpating the frontal sinuses by placing the fingers 
on the orbital roof below the eyebrow. 



only for sinus tenderness (k, 0.59) and was fair for maxillary 
sinus transillumination (simple agreement, 61%; K, 0.22). In 
the only other study of observer variability for transillumina¬ 
tion, otolaryngologists also had modest agreement between 
examiners for the maxillary sinuses (simple agreement, 
62%), but agreement was good for the frontal sinuses (simple 
agreement, 95%). 22 

Observer agreement is high for most patient symptoms, 
but for the physical examination agreement is high only for 
sinus tenderness. 


Accuracy of Symptoms and Signs of Sinusitis 

There have been few attempts to systematically evaluate the 
accuracy of the clinical examination for sinusitis. Three stud¬ 
ies assessed the discriminate ability of sinusitis symptoms 
and signs in adults. One evaluated 69 historical items among 
164 consecutive patients with sinusitis suspected by the 
patient or otolaryngologist. 23 These symptoms were com¬ 
pared to a reference standard of 4-view radiography (Cald¬ 
well, Waters, lateral, and submental vertex projections). Six 
symptoms (preceding upper respiratory infection, any nasal 
discharge or purulent nasal discharge, painful mastication, 
malaise, cough, and hyposmia) were significantly (P < .01) 
more common in patients with abnormal radiographs, but 
no single finding was highly accurate. 

We compared symptoms to radiograph in 247 consecutive 
male patients who had rhinorrhea or facial pain unrelated to 
trauma or who suspected they might have sinusitis. 20 Colored 
nasal discharge, cough, and sneezing were the most sensitive 
symptoms (72%, 70%, and 70%, respectively) but were not 
specific (52%, 44%, and 34%, respectively). One symptom, 
maxillary toothache, was highly specific (93%), but only 11% 
of patients reported this symptom. Historical items thought 
to make sinusitis less likely, such as sore throat (sensitivity, 
52%; specificity, 56%), itchy eyes (sensitivity, 52%; specific¬ 
ity, 43%), and constitutional symptoms (sensitivity, 56%; 
specificity, 47%), were not useful. 

A third study compared symptoms to ultrasonographic find¬ 
ings in 400 general practice patients selected for study because 
their physician intended to test or treat for sinusitis. 24 Results 
from this study should be interpreted with caution because 
the reference standard (ultrasonography) was not interpreted 
independent of the clinical findings and is less accurate than 
radiography. 11 - 12 In the study by van Duijn et al, 24 preceding 
common cold (sensitivity, 85%; specificity, 28%), pain at bend¬ 
ing forward (sensitivity, 65%; specificity, 59%), and purulent 
rhinorrhea (sensitivity, 62%; specificity, 67%) were the most 
useful findings. Toothache was found to be highly specific 
(specificity, 83%). 

Studies in children are limited to sensitivities for a few clin¬ 
ical findings. Clear or purulent discharge (sensitivity, 76%- 
84%) and cough (sensitivity, 48%-80%) are the most sensi¬ 
tive findings (Table 45-2), but the discriminating power of 
these findings is not known. 25 ' 28 

The most studied but least understood physical examination 
maneuver is paranasal sinus transiUumination. 5 ' 8120 ' 22 ' 25 ' 27,29 ' 32 
Since the technique was first described in 1889 by Voltolini, 33 
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its value as a diagnostic test has been hotly debated. Several 
authors have described transillumination as “highly predictive 
of disease,” whereas another author has described the use of 
transillumination as an act of criminal negligence. 34 Most 
studies of transillumination have methodologic limitations, 
and 2 of the more complete studies had differing results. 20,30 

Our own study compared the results of transillumination 
to paranasal sinus radiographs in 247 consecutive patients 
with nasal symptoms who were treated in general medicine 
clinics at a Veterans Affairs medical center. 20 Transillumina¬ 
tion, using a Welch-Allyn-Finnoff transilluminator or Mini 
MagLite (Mag Instrument Inc, Ontario, California) placed 
over the infraorbital rim, did little to change the posttest 
probability of sinusitis. It generated a likelihood ratio (LR) of 
only 1.6 if either maxillary sinus was dull or opaque and 0.5 if 
both maxillary sinuses transilluminated normally. Clearly, as 
a single finding, transillumination could not be relied on to 
rule in or rule out sinusitis. 

The second study included 113 patients with nasal symp¬ 
toms and abnormal sinus radiographs and found different 
results. 30 In the subset of these patients who were examined 
by an otolaryngologist (using the same transillumination 
technique as our study), transillumination was highly useful 
when the sinus was either completely opaque (LR, °°) or 
completely normal (LR, 0.04) but less useful when the find¬ 
ing was dull transillumination (LR, 0.41). In contrast to the 
previous study, opaque transillumination ruled in sinusitis 
and normal transillumination ruled out sinusitis. 

Why did these 2 studies yield such disparate results? First, 
the study populations were different (a primary care walk-in 
clinic vs an otolaryngology clinic) and may have created dif¬ 
ferent degrees of expectation bias. Second, the examiners’ 
training was different; otolaryngologists may be better trans¬ 
illuminators than general internists. These 2 studies suggest 
that transillumination may be more useful for diagnosing 
sinusitis when performed by otolaryngologists. 

Because the paranasal sinuses develop at different rates 
among children, transillumination may be less reliable than 
in adult patients. Three studies have examined the value of 
transillumination in children. In one, the examination could 
not be performed in 24% of the children because of poor 
patient cooperation. 5 For the remaining children, there was 
agreement between transillumination and radiographic find¬ 
ings in 53% and disagreement in 27%, and transillumination 
was nondiagnostic in 20%. 5 The other 2 studies reported sen¬ 
sitivities of only 76% (19/25) in one 27 and 48% (23/48) in the 
other, which was performed in children with opaque maxil¬ 
lary sinuses on radiographs who were undergoing sinus 
drainage for chronic purulent sinusitis. 32 The sensitivity of 
transillumination should have been maximal in this latter 
patient group with severe disease but nevertheless performed 
poorly. 

Information is limited for other commonly assessed physi¬ 
cal examination components. In adults, sinus tenderness was 
found to have poor sensitivity and specificity (48% to 50% 
and 62% to 65%, respectively), 20,24 but other findings (tem¬ 
perature, nasal mucosal color, and percussion tenderness of 
the maxillary teeth) have not been well studied. In children, 


Table 45-2 Sensitivities (%) for Signs and Symptoms of Acute 
Sinusitis in Children 


Source 


Sign or Symptom 

Swischuk 
et al 25 
(n = 63) 

Wald 
et al 26 
(n = 30) 

McClean 27 
(n = 25) 

Kogutt and 
Swischuk 28 
(n = 96) 

Nasal discharge 

76 

77 

84 

77 

Cough 

60 

80 

60 

48 

Headache 

48 

33 

a 


Fever 

46» 

63 b 

12“ 

2i c 

Facial pain or 
swelling 


30 

8" 


Fetor oris 


50 




“Ellipses indicate information not available. 
"Fever not defined. 

'Temperature > 38.3°C (101°F). 

"Pain or swelling detected by examination. 


tympanic membrane changes from otitis media (sensitivity, 
68%) is the most common physical examination finding 
associated with sinusitis, whereas a documented temperature 
higher than 38.3°C (101°F) (sensitivity, 12% to 21%) is 
uncommon. 27,28 

Accuracy of Combinations of Symptoms and Signs 

Despite the poor accuracy of the individual symptoms and 
signs, these findings used in combination can be diagnostic for 
sinusitis. We used logistic regression modeling to identify signs 
and symptoms that best predict sinusitis. This statistical proce¬ 
dure selects findings that independently contribute toward 
making the diagnosis of sinusitis. Three symptoms (maxillary 
toothache, poor response to nasal decongestants, and history 
of colored nasal discharge) and 2 signs (purulent nasal secre¬ 
tion and abnormal transillumination) were the best predictors 
of sinusitis (Table 45-3). 20 When none of these findings were 
present, sinusitis could be ruled out (LR, 0.1), and when 4 or 
more were present, the LR was 6.4 (Table 45-4). One study 
compared 11 clinical findings elicited by experienced otolaryn¬ 
gologists with radiograph and maxillary sinus aspiration in 
155 patients presenting to an emergency department with 


Table 45-3 Independent Predictors of Sinusitis 3 


Symptom or Sign 

LR+ (95% Cl) 

LR- (95% Cl) 

Maxillary toothache 

2.5 (1.2-5.0) 

0.9 (0.8-1.0) 

Purulent secretion 

2.1 (1.5-3.0) 

0.7 (0.5-0.8) 

Poor response to decongestants 

2.1 (1.4-3.1) 

0.7 (0.6-0.9) 

Abnormal transillumination 

1.6 (1.3-2.0) 

0.5 (0.4-0.7) 

History of colored nasal 
discharge 

1.5 (1.2-1.9) 

0.5 (0.4-0.8) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Data from Williams et al. 20 
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Table 45-4 Likelihood Ratios by Number of Signs and 
Symptoms Present 3 

No. of Symptoms and Signs Sinusitis Present Sinusitis Absent LR 

>4 16 4 6.4 

3 29 18 ZfT 

2 27 39 rT 

1 14 48 CT5 

0 2 32 oTF 

Total 88 141 ry 

Abbreviation: LR, likelihood ratio. 

“Symptoms and signs comprise maxillary toothache, purulent nasal secretion, poor 
response to decongestants, transillumination (normal bilaterally vs any abnormality), 
and colored nasal discharge by medical history. Data from Williams et al. 20 
“Ellipses indicate not applicable. 


suspected sinusitis. 35 With similar statistical techniques, a his¬ 
tory of purulent rhinorrhea or unilateral sinus pain and the 
presence of pus in the nasal cavity on examination were 
highly predictive of sinusitis. Maxillary toothache, response 
to decongestants, and transillumination were not studied. 

Physicians appear able to integrate individual signs and 
symptoms into an overall assessment that accurately diag¬ 
noses sinusitis. In our study, an overall impression that 
sinusitis was “definitely or most likely present” generated an 
LR of 4.7, and an overall impression that sinusitis was 
“unlikely or definitely absent” generated a rather low LR of 
0.4. When the impression was intermediate, the LR was 
1.4. 20 ' 36 These findings are in agreement with a study that 
investigated otolaryngologists’ ability to diagnose purulent 
sinusitis in patients with chronic symptoms. In the study by 
Berg et al, 37 the overall clinical evaluation was compared with 
sinus aspiration, with the following results: definitely sinusi¬ 
tis, LR = 19; probably sinusitis, LR = 4; probably not sinusi¬ 
tis, LR = 0.14; definitely not sinusitis, LR = 0.19. The general 
internist’s overall assessment of the likelihood of sinusitis 
performs well compared with radiograph or sinus aspiration. 

To summarize, primary care practitioners frequently eval¬ 
uate patients with nasal symptoms, and in many instances, 
sinusitis can be confidently ruled in or ruled out according to 
the clinical examination. Further studies are needed to exam¬ 
ine clinical findings that have not been studied (such as head¬ 
ache when leaning forward) and to test whether the 5 clinical 
findings found to be useful for adult men can be exported to 
other patient populations. 

THE BOTTOM LINE 

1. Sinusitis is insidious in children. Concurrent otitis media 
is common. 

2. Considered in combination, maxillary toothache, poor 
response to nasal decongestants, abnormal transillumina¬ 
tion, and colored nasal discharge by medical history or 
examination are the most useful clinical findings in pri¬ 
mary care populations. When all 5 features are present, 


the odds of sinusitis increase sharply (LR, 6.4), and when 
none are present, sinusitis is ruled out. 

3. Transillumination requires a completely darkened room, 
adequate time for dark adaptation, and practice. 

4. The overall medical history and physical examination in 
symptomatic adult patients is accurate. 
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CLINICAL SCENARIO 


A 36-year-old woman reports that she has “sinus head¬ 
aches” about once every 2 to 3 months. On many days, she 
thinks she is about to get a sinus headache, but the symp¬ 
toms resolve. On the day of her visit, she reports pressure 
in the sinuses, a headache, and nasal congestion that 
occurred when she woke up. There is no fever, cough, or 
nasal discharge. She requests an antibiotic. Your examina¬ 
tion does not reveal pus in the nares or nasal polyps, 
though you find she does have some discomfort when you 
apply pressure to the sinuses. Before you turn off the light 
to transilluminate the sinuses, what additional lines of 
inquiry could be explored? 

UPDATED SUMMARY ON SINUSITIS 

Original Review 

Williams JW Jr, Simel DL. Does this patient have sinusitis? 
diagnosing acute sinusitis by history and physical examina¬ 
tion. JAMA. 1993;270( 10): 1242-1246. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for 
The Rational Clinical Examination series, combined with 
the subject “exp sinusitis,” published in English from 1992 
to June 2004. The results yielded 191 titles, for which we 
reviewed the titles and abstracts; 22 were selected for addi¬ 
tional review. These articles were reviewed to identify stud¬ 
ies that assessed the sensitivity and specificity of medical 
history or physical examination features for sinusitis. We 
required that the studies be conducted with outpatients, 
involve prospectively collected data, and use radiologic 
imaging, endoscopy, or sinus puncture as a criterion stan¬ 
dard for acute sinusitis. We excluded studies that had 
major design biases such as a sample confined to patients 
with a clinical diagnosis of sinusitis. No new original stud¬ 
ies were identified. Two meta-analyses were identified, so 
the update focuses on those systematic reviews rather than 
individual studies. 


NEW FINDINGS 

• Among patients with a suspicion of sinusitis in general 
medical practice, the prevalence of disease from sinus aspi¬ 
rates is about 50%. 

• The radiograph serves as a pragmatic reference standard 
for primary care practice, correctly diagnosing about 4 of 5 
patients. 

Details of the Update 

Two meta-analyses of essentially the same original studies led 
their respective authors to distinctly different interpretations 
about the outcomes, though both reported that the radio¬ 
graphs appeared better than ultrasonography. Engels et al 1 
took a pragmatic approach to the reference standard for sinusi¬ 
tis and compared radiographs with sinus puncture and clinical 
examination with both sinus puncture and radiographs. In 
addition, they also evaluated varying thresholds for sinus 
radiograph positivity (opacity, air-fluid level, or mucosal thick¬ 
ening) and risk score for the clinical examination. Not surpris¬ 
ingly, the radiograph had a slightly better summary receiver 
operating characteristic curve area than the clinical examina¬ 
tion (0.83 vs 0.74, respectively), with the authors concluding 
that evaluating combinations of individual findings as 
reported in the original Rational Clinical Examination article 
may perform better than the overall clinical impressions. A 
reappraisal of the studies reported by Engels et al 1 shows a 
summary positive likelihood ratio (LR) for radiographs of 4.2 
(95% confidence interval [Cl], 2.6-6.7) and negative LR of 0.25 
(95% Cl, 0.17-0.37). From a pragmatic standpoint, using the 
radiograph as a reference standard will result in the correct 
classification of 4 of 5 patients compared with sinus puncture. 
In the 2 original articles comparing radiographs with sinus 
puncture for general medical patients with a suspicion of 
sinusitis, the prevalence of sinusitis was 49% and 51%. 2 ’ 3 

Varonen et al 4 evaluated essentially the same studies, 
although they counted one study as 2 separate studies, which 
produced slightly different results. However, the authors did not 
evaluate varying thresholds of positivity for the radiographs 
and did not include risk scores for the clinical examination 
because the scores have not been compared with puncture. 2 In 
addition, the authors could not evaluate varying levels of the 
overall clinical examination (eg, high, intermediate, or low 
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probability) because they have not been compared with sinus 
puncture. Despite results that appear clinically similar, the 
authors concluded that the clinical examination was not reli¬ 
able and that ultrasonography or radiography should be used 
“if a correct diagnosis is considered important.” 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The original data for LRs, based on the number of signs and 
symptoms, were given without their CIs. The most important 
findings were maxillary toothache, purulent nasal secretion, 
poor response to decongestant, abnormal transillumination 
result, and patient report of colored nasal discharge. We recal¬ 
culated the LRs for greater than or equal to 4, 3, 2, 1, and 0 
findings present. Patients with greater than or equal to 4 find¬ 
ings have an LR of 6.4 (95% Cl, 2.2-19), whereas those with 0 
findings have an LR of 0.1 (95% Cl, 0.02-0.41). 

CHANGES IN THE REFERENCE STANDARD 

For clinical research, the reference standard is sinus punc¬ 
ture. However, for clinical care, the radiograph will correctly 
classify 4 of 5 patients and may serve as a pragmatic standard 
for evaluating the clinical examination. 

RESULTS OF LITERATURE REVIEW 

Univariate Findings for Sinusitis 

Radiographs perform well compared to the reference stan¬ 
dard of sinus puncture ( Table 45-5). 


Table 45-5 Likelihood Ratios of Radiographs Compared 
to Sinus Puncture 

Finding LR+ (95% Cl) LR- (95% Cl) 

Radiographs vs sinus puncture 3 4.2 (2.6-6.7) 0.26 (0.17-0.37) 

(n = 6 studies) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Calculated from the data in the studies as summarized by Engels et al. 1 

EVIDENCE FROM GUIDELINES 

A panel of experts met to discuss the definition of sinusitis for 
clinical research and clinical care. 5 A useful clinical concept was 


preference for the word rhinosinusitis over sinusitis because 
sinusitis is usually associated with nasal inflammation and rhi¬ 
nitis. The experts did not perform a structured systematic litera¬ 
ture review but used consensus-building strategies to derive 
recommendations. For diagnosing acute bacterial rhinosinusi¬ 
tis, the panel’s expert opinion was based on a combination of 3 
“major” findings and 9 “minor” symptoms. The panel accepted 
a previously proposed case definition for acute rhinosinusitis 
(that has not been validated), requiring the presence of 2 or 
more major symptoms (purulent anterior nasal drainage, puru¬ 
lent posterior nasal drainage, or cough) or 1 major and at least 2 
minor symptoms (headache, facial pain, periorbital edema, ear¬ 
ache, halitosis, tooth pain, sore throat, increased wheeze, fever). 
The objective documentation of acute bacterial rhinosinusitis 
requires either visualization of purulent drainage by the clini¬ 
cian or radiographic evidence. 


CLINICAL SCENARIO—RESOLUTION 


This patient presents with a common set of symptoms. 
Patients frequently self-diagnose sinusitis, and many will 
self-medicate or present to their primary care provider with 
a request for antibiotics. The prevalence of acute bacterial 
sinusitis among patients who the physician suspects may 
have the disease is about 50%. However, unless this patient 
proves to have abnormal maxillary sinus transillumination 
results, she has none of the symptoms commonly associ¬ 
ated with radiographic-proven sinusitis. The probability of 
sinusitis with none of the 5 findings is about 9%. 

The keys to additional lines of inquiry are recognizing 
that acute bacterial sinusitis is not something that comes 
and goes within a given day but is more persistent. 
Migraine headaches frequently begin on awakening and 
are associated with nasal stuffiness, leading patients to 
“misdiagnose” themselves. The absence of frank nasal dis¬ 
charge from the medical history and your examination, 
along with the abrupt onset of symptoms associated with 
the headache, supports an alternative diagnosis such as 
vascular headaches. The decision to obtain a sinus radio¬ 
graph depends on whether you would treat with decon¬ 
gestants, antibiotics, or steroid inhalers for a positive 
result. If she has an abnormal radiographic result (LR, 
4.2), the probability of acute bacterial sinusitis is about 
30%, given the absence of clinical findings. You might 
value a normal radiographic result if it would help in per¬ 
suading the patient that she does not likely have acute 
bacterial sinusitis. The probability of acute bacterial 
sinusitis is less than 3% for a patient with none of the clin¬ 
ical findings and a normal radiographic result. 
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SINUSITIS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Among general medical patients with suspected sinusitis, 
the prevalence of disease as determined by sinus puncture 
and culture is 50%. 

POPULATION FOR WHOM SINUSITIS 
SHOULD BE CONSIDERED 

Sinusitis may be thought of as “rhinosinusitis” to emphasis 
the role of nasal symptoms but requires additional clinical 
research to determine whether the change in terminology 
requires a change in management approaches. Sinusitis 
should be considered in patients with nasal stuffiness, nasal 
discharge, or maxillary facial pain. Many patients will 
present with a self-suspicion of sinusitis. 

DETECTING THE LIKELIHOOD OF 
SINUSITIS IN ADULTS 

The presence of 4 or more findings (maxillary toothache, 
purulent nasal secretion, poor response to decongestant, 
abnormal transillumination request, patient report of col¬ 
ored nasal discharge) makes sinusitis much more likely, 
whereas the absence of any of the findings makes sinusitis 
unlikely (Table 45-6). 


Table 45-6 Likelihood Ratios 

for Radiographs and the Clinical 

Findings for Sinusitis 




LR+ (95% Cl) 

LR- (95% Cl) 

Radiographs vs sinus puncture 

(6 studies) 3 



4.2 (2.6-6.7) 

0.26(0.17-0.37) 

Clinical findings compared with 

sinus radiographs (1 

study) 3 

>4 

6.4 (2.2-19) 


3 

2.6(1.5-4.4) 


2 

1.1 (0.73-1.7) 


1 

0.47 (0.27-0.80) 


0 

0.1 (0.02-0.4) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Maxillary toothache, purulent nasal secretion, poor response to decongestant, 
abnormal transillumination result, patient report of colored nasal discharge. 

REFERENCE STANDARD TESTS 

Sinus puncture with culture serves as the reference standard 
for research. Clinicians will prefer to use sinus radiographs, 
although some patients (approximately 20%) will be mis- 
classified. A recent panel of experts accepts an abnormal 
radiographic result as evidence of acute bacterial rhinosi¬ 
nusitis for patients with appropriate symptoms. 5 
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2. van Buchem FL, Knottnerus JA, Schrijnemaekers VJ, Peeters MF. Pri¬ 
mary-care-based randomised placebo-controlled trial of antibiotic treat¬ 
ment in acute maxillary sinusitis. Lancet. 1977;349(9053):683-687. 

3. Laine K, Maatta T, Varonen H, Makela M. Diagnosing acute maxillary 
sinusitis in primary care: a comparison of ultrasound, clinical examina¬ 
tion and radiography. Rhinology. 1998;36(l):2-6. 


4. Varonen H, Makela M, Savolainen S, Laara E, Hilden J. Comparison of 
ultrasound, radiography, and clinical examination in the diagnosis of 
acute maxillary sinusitis: a systematic review. J Clin Epidemiol. 2000;53 
(9):940-948. a 

5. Meltzer EO, Hamior DL, Hadley JA, et al. Rhinosinusitis: establishing def¬ 
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a For the Evidence to Support the Update for this topic, 
see http://www.JAMAevidence.com. 
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EVIDENCE TO SUPPORT THE UPDATE 


Sinusitis 



TITLE Meta-analysis of Diagnostic Tests for Acute Sinusitis. 

AUTHORS Engels EA, Terrin N, Barza M, Lau J. 

CITATION / Clin Epidemiol. 2000;53(8):852-862. 

QUESTION With a hierarchy of accuracy based on the 
reference standard, how well do radiography, ultrasonog¬ 
raphy, and the clinical examination perform in identifying 
patients with sinusitis? 

DESIGN Systemic review and meta-analysis. 

DATA SOURCES Original articles were identified 
through MEDLINE, along with a review of the reference 
lists and review articles. 

STUDY SELECTION AND ASSESSMENT English- 
language articles from 1996 to 1998 that met prespecified 
criteria. Studies had to be among patients with symptoms 
consistent with sinusitis, and all patients had to undergo 
evaluation so that verification bias was avoided. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The relevant tests were radiography, ultrasonography, and 
the clinical examination. Each test was compared with sinus 
puncture, when studies were available. Following this “ideal” 
reference standard study, the authors included studies in 
which ultrasonography or the clinical examination was com¬ 
pared to radiography as a pragmatic reference standard. No 
adequate studies of computed tomography or magnetic reso¬ 
nance imaging were identified. 

MAIN OUTCOME MEASURES 

The studies were assessed for the country, setting, patient 
characteristics, adequacy of blinding, definition of tests, and 
number of cut points assessed. A summary receiver operating 
characteristic (ROC) curve was generated for each compari¬ 
son, along with summary sensitivity and specificity estimates. 
An estimate of the likelihood ratio (LR) was estimated from 


the summary sensitivity and specificity, rather than calculat¬ 
ing from the original data. 

MAIN RESULTS 

The authors identified 4070 potential articles. From these 
articles, they found the following that met their inclusion cri¬ 
teria: studies comparing radiology with puncture (n = 6), 
ultrasonography with puncture (n = 5), clinical examination 
with puncture (n = 1), ultrasonography with radiology (n = 
3), clinical examination with radiology (n = 3) ( le 45-7). 
All studies were done in Europe, except for an ultrasonogra¬ 
phy and a study that compared clinical examination to radio¬ 
graphs done in the United States. Of the 4 clinical studies, 
only 1 study restricted to children was not included in the 
original Rational Clinical Examination article. 

In the 2 puncture studies of adults in a generalist clinical 
practice, the prevalence of sinusitis was 49% and 51%. 

The diagnostic odds ratio for radiographs is 18 (95% con¬ 
fidence interval [Cl], 12-27; P = .09 for heterogeneity, with 


Table 45-7 Sensitivity and Specificity of Radiographs and 
the Clinical Examination 


Test (No.) 

Result 

Summary 
Sensitivity 
(95% Cl) 

Summary 
Specificity 
(95% Cl) 

Summary 
ROC Curve 
Area 

Radiographs vs 




0.83 

puncture (6) 

Opacity 

0.41 

(0.33-0.49) 

0.85 

(0.76-0.91) 



Fluid or 
opacity 

0.73 

(0.60-0.83) 

0.80 

(0.71-0.87) 



Fluid, opacity, 
or mucus 
membrane 
thickening 

0.90 

(0.68-0.97) 

0.61 

(0.20-0.91) 


Clinical exami¬ 
nation vs radiog¬ 
raphy (3) a 




0.74 


Abbreviations: Cl, confidence interval; ROC, receiver operating characteristic. 

“One study compared the overall clinical impression to sinus radiography, one evalu¬ 
ated a risk score for children, and one evaluated a risk score for adults. The 2 risk 
score studies show similar points on the ROC curve. 
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TITLE Comparison of Ultrasound, Radiography, and 
Clinical Examination in the Diagnosis of Acute Maxillary 
Sinusitis: A Systematic Review. 

AUTHORS Varonen H, Makela M, Savolainen S, Laara E, 
Elilden J. 

CITATION / Clin Epidemiol. 2000;53(9):940-948. 

QUESTION Compared with sinus puncture or com¬ 
puted tomography (CT), how well do radiographs, ultra¬ 
sonography, and the overall clinical examination perform 
in diagnosing acute maxillary sinusitis? 

DESIGN Systemic review and meta-analysis. 

DATA SOURCES Original articles were identified 
through MEDLINE (1996 to April 1999) and a Finnish 
database (Medic), along with a review of the reference 
lists, review articles, and hand searching of 4 relevant 
journals. 

STUDY SELECTION AND ASSESSMENT Included 
studies compared the tests of interest with sinus puncture 
or CT for adults with suspected acute maxillary sinusitis 
and symptoms of fewer than 3 months’ duration. 


P = 48%). Sinus radiographs vs sinus puncture showed an 
accuracy of 81%, with reasonably narrow CIs, despite sta¬ 
tistical heterogeneity (95% Cl, 74%-87%). We calculated 
these results for the odds ratio and accuracy from data in 
the original reports (see Table 45-7). 

Although the comparison of a clinical risk score with 
puncture had a summary ROC area of 0.91, the authors iden¬ 
tified potential problems with internal validity. The studies 
comparing ultrasonography with puncture had too much 
variability for adequate ROC curve assessment, whereas 
those compared with radiography were so close together that 
a curve could not describe the points. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS The inclusion criteria are well specified and 
inclusive. 

LIMITATIONS A quality score was not assigned, and some 
studies were included that lacked appropriate blinding or 
description of the factors that defined a positive result. Tests 
for homogeneity were not done, nor were summary esti¬ 
mates of the LRs given with their CIs. 

Some readers will be uncomfortable with the decision to 
pool studies of varying quality and that were potentially 
biased by the lack of blinding and case definitions. The 
authors used the summary sensitivity and specificity to esti¬ 
mate the positive and negative LRs (LR+ and LR-) of radio¬ 
graphs (LR+, 3.7; LR-, 0.34) for “fluid or opacity” compared 
with sinus puncture. The absence of fluid, opacity, or muco¬ 
sal thickening decreased the LR- to 0.16, but the specificity 
had broad CIs. On the other hand, the CIs suggest that the 
LR- could be much lower and that, even with a prevalence of 
50%, a completely normal radiograph result would greatly 
decrease the probability of sinusitis. From a pragmatic stand¬ 
point, using radiographs as the reference standard will result 
in the misclassification of about 20% of patients. 

Sinus ultrasonography is a test primarily used in Europe. 
Because of the substantial variability in results, the authors 
infer that ultrasonography may require experience that is 
more extensive before clinicians can rely on it. No studies of 
computed tomography were of sufficient quality to meet 
their inclusion criteria. 

The authors conclude that the clinical examination does 
have “moderate” ability to identify patients with sinusitis. 
They recommend further evaluation of risk scores for chil¬ 
dren and adults because they are less reliant on the experi¬ 
ence of the examining clinician. 

Reviewed by David L. Simei, MD, MHS 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Radiograph results were considered abnormal when the 
patient had at least 6-mm mucosal thickening, an air-fluid 
level, or a complete opacity. Ultrasonographic results were 
considered abnormal according to previously published cri¬ 
teria. 1 The overall clinical impression was evaluated as posi¬ 
tive or negative for sinusitis. 

MAIN OUTCOME MEASURES 

The studies were assessed for validity with standard criteria from 
the Cochrane Collaboration. A summary receiver operating 
characteristic (ROC) curve was generated for each comparison, 
along with summary sensitivity and specificity estimates. Fixed- 
effects summary likelihood ratios (LRs), without confidence 
intervals (CIs), were provided. The authors did not provide 
quantitative estimates of heterogeneity. 

MAIN RESULTS 

The authors identified 1054 potential articles. From these 
articles, they found 11 articles that met their inclusion; all 
were studies compared with sinus puncture, except for 1 
study that used CT. The LRs were similar for radiographs, 
ultrasonography, and clinical examination ( ble 45-8). 

Despite clinically similar LRs, the authors observed that 
ultrasonographic findings were heterogeneous. They also 
report that, as the prevalence of disease decreased in studies, 
the sensitivity decreased. 
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Table 45-8 Likelihood Ratio for Sinusitis for Radiographs, 
Ultrasonography, and the Clinical Examination 


Test (No.) 

Summary 
Sensitivity 
(95% Cl) 

Summary 
Specificity 
(95% Cl) 

Summary 

LR+ 

Summary 

LR- 

Radiographs 

(7) 

0.87 (0.85-0.88) 

0.89(0.88-0.91) 

3.4 

0.26 

Ultrasonog¬ 
raphy (7) 

0.85 (0.84-0.87) 

0.82 (0.80-0.83) 

2.8 

0.30 

Clinical 

examination 

(2) 

0.69 (0.65-0.73) 

0.79 (0.75-0.82 

3.2 

0.40 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS The inclusion criteria are well specified and 
inclusive. 

LIMITATIONS A quality score was not assigned, and some 
studies were included that lacked appropriate blinding or 
description of the factors that defined a positive result. Tests 
for homogeneity were not reported. For the summary ROC 
curve, the authors do not allow “gradations” of positivity for 
the tests of interest. Thus, information may have been lost in 
studies that dichotomize the overall clinical impression into 
“positive” or “negative.” In addition, the authors used fixed- 
effects measures without CIs for the summary LR. 

The included studies are almost identical to the meta-anal- 
ysis published by Engels et al. 2 Not surprisingly, the results 


are similar, with differences explained more by the methods 
used for summarizing results than the results themselves. For 
radiographs, the only difference in the studies included is 
that the data from one study were broken out in this meta¬ 
analysis into 2 separate studies with different results for 
radiographs (likewise, they broke out the ultrasonographic 
data from this single study as if they were 3 separate studies). 1 
This likely created some bias in the outcomes. 

These authors concluded, as did Engels et al, 2 that there 
was too much heterogeneity for ultrasonography and that 
radiographs may perform better. 

Despite LRs that are not appreciably different from radio¬ 
graphs, the authors conclude that the clinical examination is 
not reliable and that radiographs should be used when a 
“correct diagnosis is required.” Given that the clinical exami¬ 
nation was used to select the patients for radiographs and 
that no CIs were provided for the LR, it is difficult to con¬ 
clude that the clinical examination is useless. Furthermore, 
analyzing the clinical examination as either positive or nega¬ 
tive may dilute the efficiency of the clinical examination 
when patients with an estimated intermediate probability of 
disease are forced into the positive or negative categories. 

REFERENCES FOR THE EVIDENCE 

1. Revonta M. Ultrasound in the diagnosis of maxillary and frontal sinusi¬ 
tis. Acta Otolaryngol (Stockh). 1980;370(suppl):l-55. 

2. Engels EA, Terrin N, Barza M, Lau J. Meta-analysis of diagnostic tests for 
acute sinusitis. / Clin Epidemiol. 2000;53(8):852-862. 

Reviewed by David L. Simel, MD, MHS 
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CLINICAL SCENARIO 


CHAPTER 


Does This Patient Have 

Splenomegaly? 

Steven A. Grover, MD, MPA, FRCPC 
Alan N. Barkun, MD, FRCPC 
David L. Sackett, MD, FRSC, FRCPC 


Among the patients you are seeing today are the following 3: 

The first is an elderly woman who complains of easy 
fatigability, and her conjunctivae and nail beds are pale. 
You suspect that she is anemic because of gastrointestinal 
blood loss, but among your differential diagnoses you 
consider a lymphoproliferative disorder and decide to 
examine her for splenomegaly. 

The second is a college student with failing appetite, 
ability to concentrate, energy, and grades. You think that 
he is depressed but want to rule out infectious mononu¬ 
cleosis and decide to examine him for splenomegaly. 

The third is an otherwise healthy man with well- 
controlled hypertension and a normal cardiovascular 
examination result. As he lies on the examining table, 
stripped to his waist, you wonder whether you should take 
the time to examine him for splenomegaly. 


WHY EXAMINE THE SPLEEN? 


We examine the spleen to see whether it is palpable. Most 
palpable spleens are enlarged, and splenomegaly in an adult 
requires an explanation, for it may be a manifestation of dis¬ 
ease. Despite many important causes of splenomegaly, 
including cancers, infections, and connective tissue diseases, 
many of these diagnoses are relatively uncommon such that 
isolated splenomegaly in an otherwise healthy adult is most 
often associated with nonspecific infections or no obvious 
cause. 1 


ANATOMIC LANDMARKS AND SPLENIC SIZE 

The normal spleen is a curved wedge that follows the course 
of the bony portion of the left 10th rib (Figure 46- 1A). Its 
narrow posterior pole points back and to the right, toward 
the spine. Its outer surface is convex and lies just beneath 
the left side of the diaphragm, and its blunt anterior pole 
approaches the midaxillary line, pointing toward the left 
side of the colic flexure. Its inner convex surface bears a 
large impression from the posterior wall of the stomach, 
and its inferior edge bears impressions from the upper 
pole of the left kidney and, occasionally, the tail of the 
pancreas. 

HOW URGE IS THE NORMAL SPLEEN? 

Autopsies after sudden traumatic death in individuals free 
of disorders likely to lead to splenomegaly have provided 
information on the usual weight of the spleen. In Philadelphia, 
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Figure 46-1 The Normal Sized Spleen Rests Hidden Under the Rib Cage 

A, The normal spleen is a curved wedge that follows the course of the bony portion of the left 10th rib. Its narrow posterior pole points back and to the right, 
toward the spine. Its outer surface is convex and lies just beneath the left side of the diaphragm, and its blunt anterior pole approaches the midaxillary line, 
pointing toward the left side of the colic flexure. Its inner convex surface bears a large impression from the posterior wall of the stomach, and its inferior edge 
bears impressions from the upper pole of the left kidney and occasionally the tail of the pancreas. B, As the spleen enlarges, its anterior pole continues to follow 
the left 10th rib as the spleen descends below the rib cage and across the abdomen toward the right iliac fossa. 


Pennsylvania, such spleens exhibited median weights from 
90 g (among black women) to 170 g (among young white 
men), with intermediate values for black men (100 g), 
white women (115 g), and elderly white men (130 g). The 
pathologists who conducted these studies stated that the 
“best rule of thumb is to regard any spleen under 250 g as 
normal.” 2 This biologic variation in average spleen size 
underscores the need for a criterion standard definition of 
splenic enlargement that is acceptable to patients (ie, hav¬ 
ing one's spleen weighed is painful) and reproducible for 
clinicians. 

One such standard is the radioisotopic scintiscan, pre¬ 
sented (with the most commonly used normal values in 
parentheses) as maximum values for length (12 cm) and 
width (7 cm), 3 surface area (80 cm 2 ), 4 or volume (250 cm 3 ). 5 
Most recently, an ultrasonographic criterion standard has 
been suggested, with splenomegaly defined as a cephalo- 
caudad diameter of 13 cm or more. 6,7 


THE CONSEQUENCES OF SPLENOMEGALY FOR THE 
CLINICAL EXAMINATION 

Because the normal-sized spleen almost always lies entirely 
within the rib cage, it usually cannot be palpated. However, 
as it enlarges it displaces the stomach but cannot displace the 
spine, diaphragm, or kidney. Therefore, its anterior pole con¬ 
tinues to follow the projection of the bony portion of the left 
10th rib, descending below the rib cage and across the abdo¬ 
men toward the right iliac fossa (Figure 46-IB). 


HOW TO EXAMINE FOR SPLENOMEGALY 

Inspection 

Inspection of the left upper quadrant might reveal a bulging mass 
emerging from under the left costal margin and descending on 
inspiration. There are no published assessments of the accuracy 
of clinical inspection. Nonetheless, this sign would be expected to 
have low sensitivity because only massive spleens will distort the 
abdominal wall sufficiently to be seen. Moreover, because other 
large masses (a polycystic kidney or gastric or colon cancer) also 
can distort the abdominal wall and may descend on inspiration, 
this sign probably does not have perfect specificity either. In the 
absence of previous documentation or suspicion of massive sple¬ 
nomegaly, this is unlikely to be a useful sign. 

Percussion 

Percussion seeks to identify the loss of tympany as the enlarg¬ 
ing spleen impinges on the adjacent air-filled lung, stomach, 
and colon. 

Percussion is often claimed to be more sensitive than pal¬ 
pation for lesser degrees of splenomegaly, although evidence 
to support this claim (described herein) is scant. 

Three percussion methods have been validated against 
ultrasonography or scintigraphy: 

1. Percussion by Nixon Method 
(as Modified by Sullivan and Williams) 

The patient is placed in the right lateral decubitus position. 
Percussion is initiated midway along the left costal margin 
and continued upward along a line perpendicular to the 
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costal margin (Figure 46-2). In a normal examination, a full 
stomach can result in initial percussion dullness, but as per¬ 
cussion continues along the perpendicular line tympany then 
becomes present because of the overlying lung. Splenomeg¬ 
aly is diagnosed when the dullness is present more than 8 cm 
above the costal margin. 8 - 9 

2. Percussion by Castell Method 

The patient is placed in the supine position. Percussion is 
carried out in the lowest intercostal space in the left anterior 
axillary line in both expiration and full inspiration (Figure 
46-3). In a normal examination result, the percussion note 
remains resonant throughout this maneuver. Splenomegaly 
is diagnosed when the percussion note is dull or becomes 
dull on full inspiration. 10 

3. Percussion of Traube Space 

The patient is supine, with the left arm slightly abducted for 
access to the entire Traube space (after its description by 
Ludwig Traube, who ascribed its disappearance to pleural 
effusion, not an enlarged spleen), 11 defined by the sixth rib 
superiorly, the midaxillary line laterally, and the left costal 
margin inferiorly (Figure 46-3). With the patient breathing 
normally, this triangle is percussed across 1 or more levels 
from its medial to lateral margins. Normal percussion yields 
a resonant or tympanitic note. Splenomegaly is diagnosed 
when the percussion note is dull. 12 

Palpation 

Although many methods for palpation of the spleen have 
been reported in clinical texts and journals, only 3 have had 
their precision or accuracy documented in the clinical litera¬ 
ture and will be described herein. Relaxation of the abdomi¬ 
nal wall is a prerequisite for successful palpation and can be 
assisted by both the examiner (friendly, gentle, and warm 
hands) and the patient (flexed, supported knees). 

Two-Handed Palpation With Patient in 
Right Lateral Decubitus 

With the patient in the right lateral decubitus position, the 
examiner's left hand is slipped from front to back around the 
left lower thorax, gently lifting the left lowermost rib cage 
anteriorly and medially. The tips of the fingers of the exam¬ 
iner's right hand are pressed gently just beneath the left costal 
margin, and the patient is asked to take a long, deep breath as 
the palpation of a descending spleen is sought. If none is felt, 
the procedure is repeated, lowering the right hand 2 cm 
toward the umbilicus each cycle, until the examiner is confi¬ 
dent that a massive spleen has not been missed. (Some 
authorities suggest starting palpation over the lower abdo¬ 
men and moving up toward the costal margin.) The same 
procedure can be carried out with the patient supine. 

One-Handed Palpation With Patient Supine 

This method is identical to the former one, except that no 
counterpressure is applied by the left hand to the rib cage. 
With the patient supine, the tips of the fingers of the exam¬ 
iner's right hand are pressed gently just beneath the left costal 
margin, and the patient is asked to take a long, deep breath as 
the palpation of a descending spleen is sought. If none is felt, 


Right lateral decubitus position 



Examiner begins percussion at 
midpoint of left costal margin 
in a perpendicular direction 
towards the midaxillary line. 

Positive indication: dullness is 
present more than 8 cm above 
the costal margin. 



Figure 46-2 Nixon Method to Detect Splenomegaly 

Nixon method of percussion requires that the patient be placed in the right 
lateral decubitus position. Percussion is started at the midpoint of the left 
costal margin and proceeds perpendicularly. Splenomegaly is diagnosed 
when the dullness is present more than 8 cm above the costal margin. 



Castell spot 


Anterior 
axillary line 


Position 

during 

expiration 


Position 

during 

inspiration 


Supine position 


Diaphragm 

(expiration) 


Normal 



Positive indication: percussion is 
or becomes dull on full inspiration. 


Using Traube’s space, the examiner 
percusses across the space at one or 
more levels from medial to lateral margins 
while patient breathes normally. 

Positive indication: percussion is dull. 


during 

expiration 

Position 

during 

inspiration 


Using Castell’s method, the examiner 
percusses at the level of Castell’s spot 
in both expiration and full inspiration. 


Figure 46-3 Percussion in Traube Space and at Castell Spot to 
Detect Splenomegaly 

Traube space is defined by the sixth rib superiorly, the left anterior axillary 
line laterally, and the costal margin inferiorly. Castell spot is located at the 
junction of the lowest intercostal space and the left anterior axillary line. 
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the procedure is repeated, lowering the right hand 2 cm 
toward the umbilicus each cycle, until the examiner is confi¬ 
dent that a massive spleen has not been missed. Some exam¬ 
iners like to apply counterpressure to the patient's flank with 
the left hand while palpating with the right. 

Hooking Maneuver of Middleton With Patient Supine 

The patient is asked to lie flat with his or her left fist under 
the left costovertebral angle. The examiner is positioned to 
the patient's left, facing the patient's feet. The fingers of both 
the examiner's hands are curled under the left costal margin, 
and the patient is asked to take a long, deep breath as the pal¬ 
pation of a descending spleen is sought. 13 

Additional Features of the Palpable Spleen 

Given its origin within the rib cage, most texts state that it is 
never possible to palpate (get above) the upper border of the 
spleen, helping distinguish it from other abdominal masses 
that may present an upper border. If a spleen is greatly 
enlarged, it may be possible to feel a hilar notch along its 
medial border. 


PRECISION OF THE SIGNS FOR SPLENOMEGALY 

When groups of inpatients with and without splenomegaly 
had their Traube spaces percussed by 3 internists, the interex¬ 
aminer agreement (k values) ranged from 0.19 to 0.41, which 
is modest at best. 12 However, recent food intake reduced the 
accuracy of Traube space percussion in this study and proba¬ 
bly decreased the test precision when different physicians 
examined the same patient at various times after meals. 
Among the same patients, a second study 14 showed that the 
interexaminer agreement for palpation ranged from 0.56 to 
0.70, demonstrating that reproducibility between examiners 
of palpation was better than percussion. 

When tested among 50 patients with alcoholism, agree¬ 
ment among different examiners (using 2-handed palpation 
with the patient in the right lateral decubitus and 1-handed 
palpation with the patient supine) demonstrated an inter¬ 
class correlation coefficient of 0.75 and was as good as that 
for ascites (and marginally better than that for jaundice, 


Table 46-1 Studies of the Accuracy of Percussion 


No. of 
Patients 

Criterion 

Standard 

Maneuver 

Sensitivity, % 
(No.) 

Specificity, % 
(No.) 

118“ 

Ultrasonog¬ 

raphy 

Traube space 
percussion 





All patients 12 

62 (58/94) 

72 (109/151) 



Nonobese patients 
who have not 
eaten recently 12 

78 (29/37) 

82 (54/66) 

65 

Scintigra¬ 

phy 

Nixon method 9 

59 (10/17) 

94 (45/48) 



Castell method 9 

82 (14/17) 

83 (40/48) 


“Each patient was examined by 1 to 3 examiners, for a total of 245 examinations. 


Dupuytren contracture, vascular spiders, gynecomastia, pal¬ 
mar erythema, asterixis, or clubbing). 15 Senior gastroenterol¬ 
ogists exhibited marginally better agreement than more 
junior physicians (intraclass correlation coefficients of 0.81 
and 0.73, respectively). When different examiners were asked 
to report the extent to which the spleen tip extended below a 
specific bony landmark (eg, the xiphisternal-sternal junc¬ 
tion), their estimates varied on average by 6 cm. 16 

ACCURACY OF THE SIGNS FOR SPLENOMEGALY 

Table 46-1 summarizes studies on the accuracy of percussion. 
Using ultrasonographic results as the criterion standard, percus¬ 
sion of Traube space had a sensitivity of 62% (95% confidence 
interval [Cl], 51%-72%) and a specificity of 72% (95% Cl, 
65%-80%). 12 Percussion sensitivity was reduced by the presence 
of obesity (more false-negative results), and its specificity was 
decreased by recent food intake (more false-positive results). 
Accordingly, among leaner patients who had not eaten in the 
previous 2 hours, percussion sensitivity was 78% (95% Cl, 62%- 
90%), and its specificity was 82% (95% Cl, 70%-90%). 

A second study 9 examined the sensitivity and specificity, indi¬ 
vidually and in combination, of the Nixon and Castell methods 
of percussion (as well as 2-handed palpation in the supine and 
right lateral decubitus positions). In comparing the Nixon to the 
Castell method of percussion, the Castell method exhibited a 
higher sensitivity (82% vs 59%) but lower specificity (83% vs 
94%) (Table 46-1). 

Table 46-2 summarizes 7 studies of the accuracy of palpa¬ 
tion. The first 2 studies 17,18 assessed the accuracy of the routine 
examination for splenomegaly by abstracting the clinical 
examinations (performed by a large number and range of cli¬ 
nicians) from routine clinical charts. Both studies found low 
sensitivity (20%-28%) but high specificity (98%-100%). Most 
enlarged spleens were missed (a high rate of false-negative 
results, leading to low sensitivity), but few examiners reported 
palpating spleens that were not there (a low rate of false¬ 
positive results, leading to high specificity). When the results 
of these 2 studies were combined, the routine examination for 
splenomegaly had a sensitivity of 27% (95% Cl, 19%-36%) 
and a specificity of 98% (95% Cl, 96%-100%). 

In the other 5 palpation studies 4,5,9,14,19 (Table 46-2), the 
examination for splenomegaly was performed as part of the 
study. Because the examiners knew that they were under 
scrutiny, it is not surprising that both their true-positive 
reports and false-positive reports of splenomegaly increased; 
that is, the overall sensitivity of palpation was higher and the 
specificity lower than in the 2 previously described studies 
that assessed the routine examination as recorded in clinical 
notes. 

One study 9 compared percussion methods and palpation 
and demonstrated that the Castell method of percussion may 
be somewhat more sensitive than palpation (82% vs 71%) 
(Tables 46-1 and 46-2). Finally, if splenomegaly was declared 
when any of the 4 signs (2 for percussion and 2 for palpation) 
were positive, true-positive and false-positive declarations of 
splenomegaly increased because the increase in sensitivity to 
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Table 46-2 Studies of the Accuracy of Palpation 


No. of Patients 

Criterion Standard 

Maneuver 

Sensitivity, % (No.) 

Specificity, % (No.) 


Based on Routine Examinations Recorded in Clinical Charts 

47 

Autopsy 17 

Physical examination 

20(3/15) 

100 (32/32) 


217 

Scintigraphy 18 

Clinical impressions 

28 (26/92) 

98 (122/125) 




Overall 

27(29/107) 

98 (154/157) 


Based on Specific Examinations Done as Part of the Study 

99 

Scintigraphy 4 

Palpation 

57(31/54) 

100(45/45) 


32 

Operation 19 

Palpability 

59 (16/27) 

100 (5/5) 


100 

Scintigraphy 5 

Supine 2-handed palpation 

56 (47/84) 

69 (11/16) 


65 

Scintigraphy 9 

Supine and right lateral 
decubitus palpation 

71 (12/17) 

90 (43/48) 


118“ 

Ultrasonography 14 

Supine palpation or Middleton 
maneuver 

56 (53/94) 

93 (140/151) 




Overall 

58 (159/276) 

92 (244/265) 



“Each patient was examined by 1 to 3 examiners, for a total of 245 examinations. 


88% (fewer large spleens missed) was accompanied by a 
decrease in specificity to 83% (more normal-sized spleens 
mistakenly called large). 

The final study 14 evaluated the accuracy of bedside diag¬ 
nostic maneuvers, using receiver operating characteristic 
curve analysis. This analytic technique evaluates the discrim¬ 
inating ability of different tests by comparing the true-positive 
rate (sensitivity) and false-positive rate (1 - specificity) of 
each test using different definitions of a positive test result 
(test thresholds). The discriminating ability refers to the 
probability of correctly selecting the patient with splenomeg¬ 
aly between 2 patients: one with an enlarged spleen and one 
with a normal spleen. A test with a discriminating ability of 
zero performs no better than chance alone, whereas a perfect 
test has a discriminating ability of 100%. 

In this study, supine palpation, right lateral decubitus pal¬ 
pation, and Middleton maneuver all demonstrated similar 
discriminating abilities (73%-79%). The discriminating abil¬ 
ity of palpation and percussion was similar, although the test 
specificity of palpation appeared to be generally superior to 
percussion. 

The most important finding of this study was that palpa¬ 
tion was a better discriminator among patients in whom per¬ 
cussion result was positive. (As might be expected, these 
patients have the largest spleens.) When percussion dullness 
was present, palpation discriminated correctly 87% of the 
time. However, if percussion was not dull, palpation was a 
poor discriminator (55%) or only slightly better than chance. 
This confirms that percussion and palpation should be used 
together because percussion dullness identifies a subset of 
patients in whom palpation is a useful test. If percussion 
dullness is absent, there is no need to palpate, because palpa¬ 
tion is a poor test among such patients. 

Finally, this study also demonstrated that, given a clinical 
suspicion (the prior probability or disease prevalence) of sple¬ 
nomegaly before examining the patient of 10% to 90%, it is 
difficult to substantially decrease the likelihood of an enlarged 
spleen because the false-negative rate of bedside diagnosis was 


28%, even if percussion and palpation results were negative. 
On the other hand, when a positive bedside examination result 
was defined as both percussion and palpation results being 
positive, the high test specificity of 97% significandy increased 
the likelihood of splenic enlargement to 60% or more. 

IS SPLENOMEGALY RESULT EVER NORMAL? 

About 3% of otherwise healthy students entering a US col¬ 
lege were found to have unexplained palpable spleens 1 and, 
on incomplete follow-up, appeared to fare none the worse 20 ; 
similarly, 12% of otherwise healthy postpartum women at a 
Canadian hospital had palpable spleens. 21 

THE BOTTOM LINE 

Guidelines for examining for splenic enlargement are sum¬ 
marized in Table 46-3. 

1. Splenomegaly is uncommon but occurs in a wide variety of 
conditions. Given the low sensitivity of the clinical exami¬ 
nation, it can be argued that the routine examination for 
splenomegaly cannot definitively rule in or rule out spleno¬ 
megaly in normal, asymptomatic patients when the preva¬ 
lence is less than 10% and additional imaging tests will be 
required. Rather, the examination for splenomegaly is most 
useful to rule in the diagnosis of splenomegaly among 
patients in whom there is a clinical suspicion of at least 10%. 

2. The bedside examination of the spleen should start with 
percussion. If percussion is not dull, there is no need to pal¬ 
pate because the results of palpation will not effectively rule 
in or rule out splenic enlargement. If the possibility of miss¬ 
ing splenic enlargement remains an important clinical con¬ 
cern, then ultrasonography or scintigraphy is indicated. In 
the presence of percussion dullness, palpation should fol¬ 
low. If both test results are positive, the diagnosis of spleno¬ 
megaly is established (providing that the clinical suspicion 
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Table 46-3 Guidelines for Examining for Splenic Enlargement 

Recommendations and Rationale 
Clinical Suspicion (Prior Probability) of Splenic Enlargement 

Less than 10% 

Percussion or palpation for splenomegaly of limited usefulness 
Maneuvers are not sufficiently sensitive to rule out splenomegaly 
Given the low pretest probability of splenomegaly, test specificity 
of clinical examinations is not sufficiently high to rule in splenic 
enlargement, even if both test results are positive 
10% Or more 

Percussion and palpation can be used to rule in splenomegaly if both 
results are positive 

Percuss first, and if result is positive, then palpate 
If percussion result is negative but your clinical suspicion remains 
high, order ultrasonography because palpation in the presence of 
abdominal tympany is not specific enough to rule in splenomegaly 
If percussion result is positive but palpation result is negative, then 
ultrasonography is also needed to confidently evaluate spleen size 
To confidently rule out splenomegaly, a radiologic procedure is necessary 
because of the limited sensitivity of bedside examination 


of splenomegaly was at least 10% before examination). If 
palpation result is negative, diagnostic imaging will be 
required to confidently rule in or rule out splenomegaly. 


CLINICAL SCENARIO—RESOLUTION 


Returning to the 3 patients originally described at the 
beginning of this article, you may be able to confidently 
rule in splenic enlargement in the pale elderly women com¬ 
plaining of fatigue if your preexamination clinical suspicion 
of splenomegaly is at least 10% and if both percussion and 
palpation results are positive. Abdominal examination is 
not sufficiently sensitive to rule out splenic enlargement in 
the college student with symptoms of depression. Finally, 
you may choose to examine for splenic enlargement in the 
asymptomatic man with hypertension, but a negative 
examination result may be a false negative, and a positive 
examination result will require radiologic confirmation to 
rule in splenomegaly. 
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U P D AT 


CLINICAL SCENARIO 


A 34-year-old man has complained of fatigue and abdominal 
pain. He presents to the emergency department with vague 
abdominal pain and fever. The medical history is also that of 
intermittent sweats and some weight loss. Your examination 
reveals diffuse adenopathy. Traube space is dull to percussion. 
You decide to try to palpate the spleen edge but, despite 
spending a few minutes examining the patient while he is 
supine and then while he is on his side, you decide that you 
cannot feel the spleen. According to your findings, how confi¬ 
dent should you be that the spleen is not enlarged? 

Original Review 

Grover SA, Barkun AN, Sackett DL. The rational clinical 
examination: does this patient have splenomegaly? JAMA. 
1993;270(18):2218-2221. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search strategy for The 
Rational Clinical Examination series, combined with the sub¬ 
ject “exp splenomegaly,” published in English from 1991 to 
2004, and articles that referred to the original review. The 
results yielded 136 articles, for which we reviewed the titles 
and abstracts. We found 5 articles suitable for review, 
although 1 was a duplicate publication. Of the remaining 4 
studies, 3 were selected because they had prospective evalua¬ 
tion of patients for splenomegaly, with both sensitivity and 
specificity data collected independent of an ultrasonograph 
used as the reference standard test. One of the studies had 
information on the interobserver variability of examination 
techniques. We also identified 1 study of the sensitivity of the 
examination for splenomegaly in athletes. 

SUMMARY OF NEW FINDINGS 

• Palpation might have greater accuracy than percussion, 
especially in lean patients. However, assessment of palpa¬ 
tion performance may be biased because in a number of 
studies, palpation followed percussion maneuvers. 



Prepared by Alan N. Barkun, MD, and Steven A. Grover, MD 

Reviewed by Andrew Muir, MD 


• Examiners should become proficient in 1 palpation 
method and 1 percussion method because the combina¬ 
tion of both results may be better than either alone. 

Details of the Update 

A study from Brazil suggested that combining the results of 2 
examiners’ palpation findings (presence of a palpable spleen, 
presence of a spleen felt more than 4 cm below the costal 
margin) 1 gave good results. When both physicians palpated 
the spleen, the likelihood ratio (LR) of splenomegaly 
increased (LR, 7.6; 95% confidence interval [Cl], 4.5-12); 
when neither physician palpated the spleen, the likelihood of 
splenomegaly decreased (LR, 0.31; 95% Cl, 0.14-0.56). The 
inference is that having a colleague confirm your findings for 
splenomegaly might be useful. 

Two studies from India suggested that palpation maneu¬ 
vers may have better accuracy for diagnosing splenomegaly 
than percussion techniques. 2,3 However, the order of the 
maneuvers was not stated, and by western standards the 
patients were of small stature and size (as measured by body 
mass index). Furthermore, in one of the studies, false-negative 
results for Traube space percussion were significantly higher 
in smaller patients. 3 Therefore, although palpation may per¬ 
form better than percussion in lean patients, we do not know 
whether the same test characteristics apply to patients with 
larger body mass. 4 The clinical utility appears enhanced when 
the results of both percussion and palpation are considered 
but should be confirmed in other studies in which the order 
of the examination is specified. 3 

A study performed in a convenience sample of patients 
with confirmed or suspected human immunodeficiency virus 
(HIV) suggested that 3 palpation maneuvers and 3 percus¬ 
sion maneuvers were relatively insensitive but had better 
specificity, 5 supporting the findings of the original Rational 
Clinical Examination article. Although the study sample was 
small (27 patients), a unique feature of the evaluation was 
that there were 8 observers, allowing a comparison between 
observers and an assessment to see whether the various 
maneuvers performed similarly. However, they also noted 
significant interobserver variability that did not depend on 
the years of medical practice. The poor reliability was evident 
in the broad range of individual assessors’ sensitivity and 
specificity values. The sensitivity of each of the tests seemed 
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to improve with the length of the trial, but the overall accu¬ 
racy of all the findings was low. Because the evaluation of 
such a large number of individual findings may have lacked 
independence, and because the total number of patients was 
so small, we did not combine these results with other studies. 

The presence of splenomegaly in athletes (often caused by 
mononucleosis) creates a diagnostic dilemma for clinicians 
who must decide when the splenomegaly has resolved so that 
the athlete can return to sports participation. A study of 29 
athletes with splenomegaly (length, 12.5-15.5 cm) docu¬ 
mented by ultrasonography (normal length <12 cm), showed 
that the clinician could detect the spleen in only 17%. 6 Many 
athletes have well-developed abdominal musculature, which 
makes palpation for splenomegaly even more difficult. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A reappraisal of the original publication showed that CIs 
around the signs would help understanding of their potential 
importance. 7 We used the original data, in addition to data 
from newer articles, to create random-effects summary esti¬ 
mates for the LRs. In addition, we used the diagnostic odds 
ratios to assess whether the overall accuracy for some maneu¬ 
vers might be better than others. We used only data from stud¬ 
ies that used ultrasonography as the reference standard test. 

CHANGES IN THE REFERENCE STANDARD 

Although radiologic studies have suggested the possible use of 
competing technologies, such as nuclear scan and specialized 
computed tomography (CT) examinations, the most widely 
recognized and available gold standard remains ultrasonogra¬ 
phy. All articles assessing the utility of clinical examination 
maneuvers in the detection of splenomegaly published in the 
past 13 years used ultrasonography as the reference standard 
for the diagnosis of splenomegaly (a length of 12 or 13 cm). 

RESULTS OF LITERATURE REVIEW 

Percussion using the Nixon method (Figure 46-2) or Traube’s 
space (Figure 46-3) works best for detecting splenomegaly 


Table 46-4 Likelihood Ratios of Percussion and Palpation Maneuvers 
for Splenomegaly 

Maneuver (No. of 


Combined Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

DOR (95% Cl) 

Percussion Maneuvers 

Nixon sign (1) 

3.6(1.8-7.3) 

0.41 (0.26-0.64) 

8.9(3.1-25) 

Percussion of Traube 
space (3) 

2.3(1.8-2.9) 

0.48 (0.39-0.60) 

4.8 (3.2-7.3) 

Castell sign (1) 

1.2(0.98-1.6) 

0.45(0.19-1.1) 

2.8 (0.92-8.3) 

Palpation Maneuvers 

Supine, 1-handed 
palpation (4) 

8.2(5.8-12) 

0.41 (0.30-0.57) 

22 (13-38) 

Middleton hooking 
maneuver (1) 

6.5(3.1-15) 

0.16(0.08-0.32) 

40(11-138) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 


( ). Supine one-handed palpation has been the 

most widely studied palpation maneuver, which increases the 
confidence in the results (Table 46-4). 

EVIDENCE FROM GUIDELINES 

No federal guidelines discuss the assessment of splenomegaly 
by using physical examination. 


CLINICAL SCENARIO—RESOLUTION 


This patient may have a viral or myeloproliferative syn¬ 
drome, so you have a good reason to assess for splenomeg¬ 
aly. The physical examination results seem contradictory. 
You have percussed dullness (which increases the likelihood 
of splenomegaly), but you cannot palpate the splenic tip 
(which decreases the likelihood of splenomegaly). The per¬ 
cussion findings have a lower accuracy than the palpation 
signs (as suggested by the diagnostic odds ratios). You 
decide you need to know whether the patient has spleno¬ 
megaly, so you must proceed to additional testing with 
ultrasonography or a CT scan. 
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SPLENOMEGALY— MAKE THE DIAGNOSIS 


During the general physical examination, patients should 
not be evaluated for splenomegaly. 

PRIOR PROBABILITY 

The prevalence of palpable splenomegaly in an otherwise 
healthy student population is low, approximating 3% 8 ; 12% 
of normal postpartum women had palpable spleens. 9 The 
prevalence of splenomegaly increases significantly among 
other selected populations, such as HIV patients (up to 
66% 10 ), or in areas in which schistosomiasis is prevalent. 11 

POPULATION FOR WHOM THE PHYSICAL EXAMINATION 
OF SPLENOMEGALY SHOULD BE SOUGHT 

• Suspected or proven viral illness, lymphoproliferative dis¬ 
order, or malignancy 

• Cirrhosis 

• Suspected portal hypertension 

• Suspected or proven malaria 

• Connective tissue disorders associated with splenomegaly 

DETECTING SPLENOMEGALY 

In cases in which splenomegaly is questioned, the clinical 
examination is more specific than sensitive and is best used 
when ruling in the diagnosis among patients for whom the 
suspicion is at least 10%. Moreover, the examination should 


start with Traube space percussion, followed, if dull, by 
supine 1-handed palpation (Table 46-5). These maneuvers 
have received more extensive evaluation than other maneu¬ 
vers, allowing us greater confidence in the findings. Middle- 
ton maneuver, in which the physician stands to the left of the 
patient and hooks the examining hand under the ribs, may 
work as well. 

Palpation may be superior to percussion, especially in lean 
patients. When it remains important not to miss splenomeg¬ 
aly, imaging will be necessary because the clinical examina¬ 
tion does not provide sufficient clinical certainty. 


Table 46-5 Summary Likelihood Ratios for Palpation to Detect 
Splenomegaly and Percussion of Traube Space 

Test (No.) 

LR+ (95% Cl) 

LR- (95% Cl) 

DOR (95% Cl) 

Supine 1-handed pal¬ 
pation (4 studies) 

8.2 (5.8-12) 

0.41 

(0.30-0.57) 

22 (13-38) 

Percussion of Traube 
space (3 studies) 

2.3(1.8-2.9) 

0.48 

(0.39-0.60) 

4.8 (3.2-7.3) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive likeli¬ 
hood ratio; LR-, negative likelihood ratio. 


REFERENCE STANDARD TESTS 

Ultrasonography, CT, nuclear liver-spleen imaging. 
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This page intentionally left blank 



EVIDENCE TO SUPPORT THE UPDATE 


Splenomegaly 



TITLE Accuracy of Palpation and Percussion Maneuvers 
in the Diagnosis of Splenomegaly. 

AUTHORS Chongtham DS, Singh MM, Kalantri SP, 
Pathak S. 

CITATION Indian } Med Sci. 1997;51(11 ):409-416. 

QUESTION What are the sensitivity and specificity of 
palpation and percussion maneuvers in diagnosing sple¬ 
nomegaly? 

DESIGN Prospective, independent comparison of non- 
consecutive cases. 

SETTING Medical ward at Katsurba Hospital, India. 

PATIENTS Eighty hospitalized patients (37 female 
patients) in a general medical ward. Exclusions were patients 
with left-sided pleural effusion, history of ascites, or spleno¬ 
megaly. Mean age was 31.5 years, and weight was 45 ± 8 kg. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The test performance characteristics of Traube space percussion 
(Barkun et al 1 ), and the percussion maneuvers of Castell 2 and 
Nixon 3 were evaluated at various percussion note thresholds (1 
= definitely tympanitic, 2 = probably tympanitic, 3 = uncertain, 
4 = probably dull, 5 = definitely dull 1 ). Supine palpation and 
Middleton palpation maneuver 4 were also assessed on a 5-point 
scale (1 = spleen definitely not palpable, 2 = spleen probably not 
palpable, 3 = uncertain, 4 = spleen probably palpable, and 5 = 
spleen definitely palpable, as previously suggested 5 ). The assess¬ 
ments were carried out by a physician blinded to the patient’s 
clinical history and laboratory results. The examination was car¬ 
ried out before or at least 2 hours after the patient had eaten. 

Ultrasonography was performed by an independent opera¬ 
tor within 24 hours of the clinical examination. 

MAIN OUTCOME MEASURES 

The sensitivity and specificity of the various maneuvers 
were described. A spleen was considered enlarged if 
greater than 13 cm on ultrasonography. 


MAIN RESULTS 

The prevalence of splenomegaly was 52% (42/80). Mean 
splenic size was 15 cm among those with splenomegaly and 
9.9 cm among those without enlargement. The likelihood 
ratios for the maneuvers are shown in ole 46-6. 

Receiver operating curve (ROC) analyses showed a progres¬ 
sive decline in sensitivity from 98% to 50% as the palpation 
threshold progressed from 1 to 4 (increasing certainty of feel¬ 
ing a spleen), whereas specificity increased from 58% to 95%. 

Nixon percussion maneuver was correlated with splenic 
size. The ROC area under the curve for varying thresholds on 
Traube space percussion was 0.74. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Independent assessment of well-defined physi¬ 
cal examination maneuvers. 


Table 46-6 Likelihood Ratios of Palpation and Percussion Maneuvers 
for Splenomegaly 

Test 

Sensitivity, 

% 

Specificity, 

% 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 

(95% Cl)“ 

Palpation Maneuvers 

Supine 1- 
handed pal¬ 
pation 

79 

92 

10 

(3.7-29) 

0.23 

(0.13-0.40) 

43 

(11-163) 

Middleton 
hooking pal¬ 
pation 
maneuver 

86 

87 

6.5 

(3.1-15) 

0.16 

(0.08-0.32) 

40 

(11-138) 

Percussion Maneuvers 

Nixon per¬ 
cussion 

67 

82 

3.6 

(1.8-7.3) 

0.41 

(0.26-0.64 

8.9 

(3.1-25) 

Traube 
space per¬ 
cussion 

76 

63 

2.1 

(1.4-3.3) 

0.38 

(0.20-0.66) 

5.5 

(2.1-14) 

Castell per¬ 
cussion 

86 

32 

1.2 

(0.98-1.6) 

0.45 

(0.19-1.1) 

2.8 

(0.92-8.30) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 

“DOR calculated from data provided in the article. 
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LIMITATIONS The order in which the examinations were per¬ 
formed is not described. Furthermore, the generalizability may 
be questioned, considering the patient population characteristics 
(the mean Quetelet index of the studied patients was low by 
western standards, at 17.8 ± 2.6 kg/m 2 ). The overall prevalence of 
splenomegaly suggests that this population may differ consider¬ 
ably from others or that there may have been some selection bias. 

The palpation maneuvers appeared to perform appreciably 
better than percussion methods, as evidenced by the high diag¬ 
nostic odds ratios. We do not know whether “leanness” as evi¬ 
denced by a low body mass index creates a bias that favors 
palpation or percussion. 

REFERENCES FOR THE EVIDENCE 

1. Barkun AN, Camus M, Meagher TW, et al. How useful is Traube’s space per¬ 
cussion in assessing splenic enlargement? Am JMed. 1989;87(5):562-566. 

2 Castell DO. The spleen percussion sign: a useful diagnostic technique. 
Ann Intern Med. 1967;67(6):1265-1367. 

3. Nixon RK Jr. The detection of splenomegaly by percussion. N Engl J 
Med. 1954;250(4): 166-167. 

4. Shaw MT, Dvorak V. Palpation of slightly enlarged spleens. Lancet. 
1973;1(7798):317. 

5. Barkun AN, Camus M, Green L, et al. The bedside assessment of splenic 
enlargement. Am JMed. 1991;91 (5):512-518. 

Reviewed by Alan N. Barkun, MD 


TITLE Splenic Palpation for the Evaluation of Morbid¬ 
ity Due to Schistosomiasis Mansoni. 

AUTHORS Gerspacher-Lara R, Pinto-Silva RA, Serufo 
JC, Rayes AAM, Drummond SC, Lambertucci JR. 

CITATION Mem Inst Oswaldo Cruz (Rio de Janeiro). 
1998;93(suppl I):245-248. 

QUESTION What are the reliability and validity of 2 
methods of palpation in detecting ultrasonographically 
identified splenomegaly? 

DESIGN Prospective assessment of 2 near-complete 
communities with an independent assessment by ultra¬ 
sonography. 

SETTING Two Brazilian rural communities. 

PATIENTS The study population was recruited from 
551 individuals (92% of the local population) from 
Queixadinha, in the district of Carai, located in the north¬ 
east of the State of Minas Gerais, Brazil, an area known to 
be highly endemic for schistosomiasis. An additional 517 
individuals (89% of the total population) were recruited 
from Capao, a rural community in the district of Presi- 
dente Juscelino in the center of the state, where, for 
unknown reasons, transmission of schistosomiasis proba¬ 
bly does not occur and in which other tropical diseases 
that can cause splenomegaly have never been identified. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Abdominal palpation was performed with patients in the 
decubitus position during deep inspiration, by 2 indepen¬ 
dent physicians in a blinded fashion. The greatest distance 
between the splenic border and the costal margin was also 
independently measured by the examiners. 

MAIN OUTCOME MEASURES 

The 2 examination maneuvers were considered positive for 
splenomegaly when 

1. the spleen was palpable by both examiners; and 

2. the distance between the splenic border and the costal mar¬ 
gin was greater than 4 cm, as measured by both examiners. 

Splenomegaly was defined as a splenic length greater than 
120 mm by ultrasonography. Only patients aged 18 years or 
older were included in the categorization of splenic enlarge¬ 
ment because of the lack of widely accepted quantitative cri¬ 
teria in children. 

MAIN RESULTS 

The prevalence of splenomegaly in this patient population 
was 7%. A spleen was palpated by both physicians in 37 cases 
(discordance between examiners occurred in 5 cases). Mean 
splenic lengths in patients with and without palpable spleen 
were 10.4 cm and 7.1 cm, respectively (P < .001). 
shows the likelihood ratios for the results where a positive 
test required agreement between the examining physicians. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1. 

STRENGTHS The study used a sound design and an 
accepted gold standard. 

LIMITATIONS The methods of palpation are not adequately 
described. 

The results are interesting in that a “positive” result 
required 2 examiners’ agreement. This suggests that clinicians 


Table 46-7 Likelihood Ratio for the Presence of Palpable 
Splenomegaly vs the Presence of a Large Palpable Spleen 3 


Test 

Sensitivity 

Specificity 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

DOR 
(95% Cl) 

Palpable 

spleen 

0.72 

0.91 

7.6 

(4.5-12) 

0.31 

(0.14-0.56) 

25 

(8.5-73) 

Distance 
between 
splenic border 
and costal 
margin > 4 cm 

0.28 

0.98 

14 

(4.6-41) 

0.74 

(0.50-0.89) 

19 

(5.2-69) 


Abbreviations: Cl, confidence interval; DOR, diagnostic odds ratio; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 

“The test was considered positive only when both examiners agreed on the finding. 


E46-2 










CHAPTER 46 Splenomegaly 


might have better accuracy when they ask for a second opin¬ 
ion about the palpation findings. 

Reviewed by Alan N. Barkun, MD 


TITLE Percussion of Traube’s Space—A Useful Index of 
Splenic Enlargement. 

AUTHORS Dubey S, Swaroop A, Jain R,Verma K, Garg P, 
Agarwal S. 

CITATION J Assoc Physicians India. 2000;48(3):326-328. 

QUESTION Is Traube space percussion useful in assess¬ 
ing splenic enlargement? 

DESIGN Prospective, nonconsecutive patients, with an 
independent assessment by ultrasonography. 

SETTING An Indian University hospital. 

PATIENTS One hundred patients were medical inpa¬ 
tients. 


DESCRIPTION OF "PESTS AND DIAGNOSTIC STANDARD 

After Traube space percussion, 1 the findings were labeled as 
tympanitic (resonant) or dull. Dullness to percussion is an 
abnormal finding that suggests splenomegaly. In addition, 
the spleen was also palpated with the patient positioned in 
the supine and right lateral decubitus positions. The clinician 
assessed the spleen as palpable or not palpable. Each patient 
was subsequently sent for ultrasonography. Splenomegaly 
was defined as a splenic longitudinal measurement of 12 cm 
or more on ultrasonography. 

MAIN OUTCOME MEASURES 

Sensitivity and specificity. 

MAIN RESULTS 

The prevalence of splenomegaly in this patient population 
was 36%. The splenic lengths among patients with ultra- 
sonographically diagnosed splenomegaly were 13.1 ± 0.96 
cm vs 9.42 ± 1.06 cm for those without splenic enlargement. 
The results of Traube space percussion are shown in 
i-8. The Quetelet index (a measure of body size) was higher 
among patients who had false-negative findings. 

The diagnostic odds ratios (DORs), calculated from data 
provided in article, suggest that palpation (DOR, 25; 95% 
confidence interval [Cl], 5.2-117) might be more accurate 
than percussion (DOR, 6.0; 95% Cl, 2.4-15). 


Table 46-8 Likelihood Ratios for Percussion, Palpation, and a 
Combination of the 2 Findings for Splenomegaly 


Test 

Sensitivity, 

% 

Specificity, 

% 

LR+ 

(95% Cl) 

LR- 

(95% Cl) 

Traubespace 
percussion 

67 

75 

2.7 

(1.7-4.3) 

0.44 

(0.27-0.68) 

Palpation 

44 

97 

14 

(3.5-58) 

0.57 

(0.43-0.77) 

Palpation and percussion 

Both positive 



14 

(0.85-245) 


One positive 



2.9 

(2.0-4.4) 


Both negative 



0.18 



(0.10-0.33) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS The study uses, overall, a sound design and 
an accepted gold standard. 

LIMITATIONS The exact sequence of physical examination 
maneuvers is unclear. Therefore, a comparison of the perfor¬ 
mance of one technique to another cannot be done precisely, 
because we do not know the order in which the maneuvers were 
done. However, the maximal clinical utility appeared to be 
achieved when both percussion and palpation were considered. 

These results in a different patient population appear to 
confirm the findings that Traube space percussion and palpa¬ 
tion are useful for identifying splenomegaly. The authors did 
not report a standardized order of percussion and then pal¬ 
pation or vice versa. However, in considering the results of 
the findings, clinicians can conclude that both maneuvers 
should be performed. The presence of dullness to percussion 
confirms the importance of a palpable spleen. The absence of 
dullness to percussion modulates the importance of a palpa¬ 
ble spleen and suggests that for some patients either the 
spleen is palpable, though not enlarged, or the examiner’s 
findings of a palpable spleen are in error. If neither finding is 
present, the likelihood of splenomegaly is decreased. 

REFERENCE FOR THE EVIDENCE 

1. Barkun AN, Camus M, Meagher TW, et al. How useful is Traube’s space 
percussion in assessing splenic enlargement? Am J Med. 1989;87(5):562- 
566. 

Reviewed by Alan N. Barkun, MD 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have 

Strep Throat? 

Mark H. Ebell, MD 
Mindy A. Smith, MD 
Henry C. Barry, MD 
Kathy Ives, BS 
Mark Carey, BS 


In each of the following cases, the physician must decide 
whether the patient has group A (3-hemolytic streptococ¬ 
cal pharyngitis (strep throat). In case 1, a 7-year-old boy 
presents in March without a cough but with 1 day of sore 
throat accompanied by fever, headache, moderate cervical 
adenopathy, and a markedly exudative and erythematous 
pharynx. His brother was recently diagnosed as having 
strep throat. In case 2, a 16-year-old presents with severe 
sore throat and anterior adenopathy for 3 days but no 
tonsillar enlargement, exudate, fever, or cough. In case 3, a 
42-year-old woman presents with 5 days of sore throat 
and cough but no adenopathy, tonsillar enlargement, 
recent exposure to strep, or exudate. 


WHY IS THE DIAGNOSIS IMPORTANT? 


The 1995 National Ambulatory Medical Care Survey 1 found 
that sore throat is the third most common presenting com¬ 
plaint in office-based practice, accounting for 4.3% of visits. 
Sore throat is usually caused by direct infection of the pha¬ 
ryngeal tissue (pharyngitis). The differential diagnosis of 
pharyngitis is summarized in Table 47-1. Sore throat can also 
be caused by conditions such as gastroesophageal reflux dis¬ 
ease, acute thyroiditis, persistent cough, and postnasal drain¬ 
age because of allergic rhinitis or sinusitis. However, reliable 
estimates for the likelihood of these conditions among 
patients with sore throat are not available. 

Untreated group A (3-hemolytic streptococcal pharyngitis 
typically lasts 8 to 10 days. Patients are infectious during the 
period of acute illness and for approximately 1 week after. 
Antibiotic treatment decreases the severity of symptoms, 
reduces their duration by approximately 1 day, 5 reduces the 
risk of transmission to others after 24 hours of treatment, 
and reduces the likelihood of suppurative complications and 
rheumatic fever. 6 Suppurative complications include periton¬ 
sillar abscess (occurring in <1% of patients treated with 
antibiotics), retropharyngeal abscess, suppurative cervical 
lymphadenitis, bacteremia, and, by direct extension, otitis 
media, sinusitis, and mastoiditis. 7 Rarely, the infection may 
lead to meningitis, pneumonia, or bacteremia. Rheumatic 
fever is a serious sequela of strep throat. Between 1 and 5 
weeks after an episode of strep throat, a nonsuppurative 
inflammatory reaction results in fever, carditis, subcutaneous 
nodules, chorea, or migratory polyarthritis. Acute rheumatic 
fever now occurs infrequently in the United States, with a 
reported annual incidence of approximately 1 case per 
1000000 population. 8 

Always doing a throat culture or rapid antigen test can lead 
to overtreatment of low-risk patients because of excessive 
false-positive results and undertreatment of high-risk 
patients because of excessive false-negative results. This 
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CHAPTER 47 The Rational Clinical Examination 


Table 47-1 Differential Diagnosis of Pharyngitis 

2-4 


Etiology 

Probability, % 


Viral 

50-80 


Streptococcal 

5-36 


Epstein-Barr virus 

1-10 


Chlamydia pneumoniae 

2-5 


Mycoplasma pneumoniae 

2-5 


Neisseria gonorrhoeae 

1-2 


Haemophilus influenzae type b 

1-2 


Candidiasis 

<1 


Diphtheria 

<1 



approach also leads to increased cost. 9,10 By using the preex¬ 
amination likelihood of strep throat and the clinical exami¬ 
nation, patients can potentially be divided into 3 groups: 
those with a high probability of strep throat, who could 
receive empiric antibiotic therapy (case 1, above); those with 
an intermediate probability of disease, who may require fur¬ 
ther diagnostic testing (case 2); and those with a low proba¬ 
bility of disease, who may require only symptomatic therapy 
and appropriate follow-up rather than further diagnostic 
testing or treatment (case 3). 11 

Pathophysiology 

Group A p-hemolytic streptococci trigger an inflamma¬ 
tory response in pharyngeal cells that is responsible for 
many of the signs and symptoms of pharyngitis. Interleu¬ 
kins 1 and 6, tissue necrosis factor, and prostaglandins 
cause the febrile response; prostaglandins and bradykinin 
cause pain; and prostaglandins and nitric oxide cause 
vasodilation and edema, manifested as erythema and 
swelling of the tonsillar pillars, uvula, and soft palate. 
Lysosomal enzymes and oxygen free radicals, although 
part of the body’s response to infection, also cause tissue 
damage. This tissue damage, in addition to the pustular 
nature of the group A p-hemolytic streptococcal infection, 
results in a creamy exudate from the tonsillar pillars. The 


pharynx is drained primarily by the anterior cervical 
nodes, which may become tender and enlarged during 
infection. 12 

Although group A p-hemolytic streptococcus is not part 
of the normal flora of the human throat, the asymptomatic 
carrier rate is 5.0% to 21% in children between the ages of 3 
and 15 years. It is lower in children younger than 3 years 
(1.9%-7.1%) and in older adolescents and adults (2.4%- 
3.7%). 13 

METHODS 

Search Strategy and Quality Review 

For the evaluation of individual signs and symptoms, we 
identified studies of the diagnosis of group A p-hemolytic 
streptococcal pharyngitis in patients complaining of sore 
throat. All studies included at least 300 patients, collected 
data prospectively, and used throat culture as the reference 
standard. Examiners were unaware of the results of rapid 
tests or throat cultures for strep when they performed the 
medical history and physical examination. All articles there¬ 
fore represent level 1 evidence according to previously pub¬ 
lished criteria for the evaluation of study quality. 14 The results 
for a variable are reported only if more than 1 study reported 
data for that variable. 

The MEDLINE search used the following Medical Subject 
Headings: (“sensitivity and specificity” or “predictive value of 
tests” or “medical history taking” or “physical examination”) 
and “pharyngitis.” This search identified 917 articles. In 2 
cases, authors were contacted to provide additional informa¬ 
tion or to clarify a point in the article. Unpublished data were 
not sought. Seventeen studies (15 in English, 1 in German, and 
1 in Spanish) met all of the inclusion criteria described above 
except study size. Nine studies included at least 300 patients; 
they are shown in Table 47-2. 15 ' 23 Each study was reviewed 
independently by 2 clinical investigators, and discrepancies 
were resolved by discussion. In addition, any articles develop¬ 
ing or validating a clinical prediction rule were identified. The 
included studies reported data for 5453 patients, whereas the 8 
excluded studies reported data for only 1182 patients. 


Table 47-2 Studies Included 

Source, y 

Setting 

Population Presenting With 
Complaint of Sore Throat 

Patients, No. 

Prevalence of Strep 
Throat Pharyngitis, % 

Mclsaac et al, 22 1998 

Office 

Adults and children 

520 

29 

Kljakovich, 20 1993 

Office 

Adults and children 

329 

12 

Reed et al, 21 1990 

Urgent care 

Adults and children 

806 

25 

Crawford et al, 23 1979 

Outpatient 

Adults and children 

472 

11 

Komaroff et al, 18 1986 

Office 

Adults only 

693 

10 

Walsh et al, 19 1975 

Office 

Adults only 

418 

15 

Steinhoff et al, 15 1997 

Office 

Children only 

450 

24 

Kaplan et al, 16 1971 

Emergency department 

Children only 

624 

35 

Stillerman and Bernstein, 17 1961 

Office 

Children only 

1141 

36 
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Statistical Methods 

The positive likelihood ratio (LR+) and negative likelihood 
ratio (LR-) were calculated for medical history and physical 
examination findings. The LR+ (LR+ = sensitivity/[100 - 
specificity]) is a measure of how well a positive result rules in 
disease, whereas the LR- (LR- = [ 100 - sensitivity]/specificity) 
is a measure of how well a negative result rules out disease. A 
random-effects estimate was calculated for the sensitivity, 
specificity, LR+, and LR- when the % 2 statistic suggested 
homogeneity (P > .05) or when random- and fixed-effects 
models gave similar values. Otherwise, a range for each vari¬ 
able is shown. 

The area under the receiver operating characteristic (ROC) 
curve is a measure of the diagnostic accuracy of a test. 24 Specif¬ 
ically, a greater area corresponds to a greater ability to discrim¬ 
inate between patients with and without strep throat. An area 
of 1.0 under the ROC curve means the test is perfect, whereas 
an area of 0.5 means it is no better than chance. In this study, 
the area was calculated with the method of Moses et al. 25 It was 
not possible to generate an ROC curve for some signs and 
symptoms if fewer than 3 studies reported their sensitivity and 
specificity. 

PRECISION AND ACCURACY 

Symptoms 

Classically, the streptococcal sore throat is of abrupt onset 
in older children and adults. Symptoms may be less focal 
and more gradual in younger children. 26 Throat pain is 
typically described as severe and is associated with diffi¬ 
culty in swallowing. Fever is moderate (reported tempera¬ 
ture range, 39°C to 40.5°C). Chills may be present but 
rigor is not typical. Strep throat is also classically associ¬ 
ated with malaise, headache, mild neck stiffness, and gas¬ 
trointestinal symptoms such as anorexia, nausea, vomiting, 
and abdominal pain. However, these features may be 
present in only 35% to 50% of patients and have not been 
verified by objective studies of diagnosis. Abdominal 
symptoms may be more common in younger patients, 
although more recent studies have not confirmed this as an 
independent predictor. 27 

Signs 

Examination of the throat may reveal erythema and 
edema of the pharynx and uvula and diffuse erythema and 
hypertrophy of the lymphoid tissue in the posterior phar¬ 
ynx. The posterior pharynx and tonsillar pillars may be 
covered with a gray-white membrane or exudate. The 
pharynx is often described as beefy or bright red, with the 
color ending abruptly at the soft palate. Petechiae may be 
present on the soft palate. Tonsils are commonly swollen 
and erythematous and covered with a punctate or conflu¬ 
ent gray-white exudate. The breath is characteristically 
foul. 

The anterior cervical lymph nodes are often tender and 
enlarged, especially at the angle of the jaw. This sign occurs 


early in the course of infection. When present, the character¬ 
istic scarlatiniform (“scarlet fever”) rash is one of fine erythe¬ 
matous papules beginning on the trunk and spreading to the 
extremities but sparing the palms and soles. The rash 
blanches to pressure and has a sandpapery feel. It is associ¬ 
ated with enlarged papillae on a coated tongue that may later 
become denuded (“strawberry tongue”), circumoral pallor 
and hyperpigmentation, or accentuation of the rash in the 
skin creases. This is especially prominent in the antecubital 
fossae (Pastia sign). In young children, there may be excoria¬ 
tions around the nares. The rash typically subsides in 6 to 9 
days and may be followed by desquamation of the palms and 
soles. Pharyngeal vesicles and ulcers are associated with viral 
upper respiratory tract infections; their presence reduces the 
likelihood that a sore throat is caused by group A (3- 
hemolytic streptococci. 

Properly viewing the pharynx can be challenging. Ade¬ 
quate examination of the throat requires elevation of the soft 
palate and uvula and depression of the posterior tongue. 
Although a tongue blade can help, patients often gag, cough, 
or bite. The pharynx can sometimes be viewed without a 
tongue blade by having the patient pant. Small children can 
be asked to imitate a puppy as a way of encouraging them 
to pant. 

Precision of Symptoms and Signs 

Although only limited data are available on the precision of 
symptoms and signs of streptococcal pharyngitis, these data 
suggest that observer reliability is high. Komaroff et al 18 had 2 
blinded observers examine the same randomly sampled 
patients and found 88% agreement on 187 medical history 
and physical examination items, although the K test was not 
used to evaluate agreement beyond chance. In another study, 
a physician and physician assistant both examined the ears, 
nose, throat, cervical nodes, and chest of 63 patients. Only 1 
discrepancy, in examination of cervical adenopathy, was 
observed. 19 No data were available regarding the ability of 
physicians to distinguish tonsillar from pharyngeal exudate 
or regarding the precision of the examination in patients 
who have undergone tonsillectomy. 

Diagnostic Accuracy of Symptoms and Signs 

The sensitivity, specificity, LR+, and LR- for variables that 
are reported in at least 2 studies are shown in Table 47-3. The 
variables with the greatest area under the ROC curve, and 
hence the best ability to discriminate between patients with 
and without strep throat, were pharyngeal or tonsillar exu¬ 
date, fever by history, tonsillar enlargement, tenderness or 
enlargement of the anterior cervical lymph nodes, and 
absence of cough. 

Findings that were similar across studies, had the greatest 
LR+, and were therefore best at ruling in disease were the 
presence of tonsillar exudate (LR+, 3.4), pharyngeal exu¬ 
date (LR+, 2.1), and strep throat exposure in the previous 2 
weeks (LR+, 1.9). The absence of findings was not efficient 
at ruling out disease, with the lowest LR- found for the 
absence of tender anterior cervical nodes (LR-, 0.60), 
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Table 47-3 Accuracy for Medical History and Physical Examination Elements in the Diagnosis of Strep Throat 3 


Symptoms and Signs 

Patients, No. 

Accuracy 

Sensitivity (95% 
Cl) or Range 

Specificity (95% 
Cl) or Range 

LR+ (95% Cl) 
or Range 

LR- (95% Cl) 
or Range 

Any exudates 15,16,18 ' 21 

3268 

0.68 

0.21-0.58 

0.69-0.92 

1.5-2.6 

0.66-0.94 

Reported fever 15,17,20,21 

3232 

0.68 

0.3-0.92 

0.23-0.90 

0.97-2.6 

0.32-1.0 

Measured temperature >37.8°C 15,17,18,21 

3091 

0.68 

0.11-0.84 

0.43-0.96 

1.1-3.0 

0.27-0.94 

Anterior cervical nodes swollen/ 
enlarged 15,16,18,20 ' 23 

3831 

0.67 

0.55-0.82 

0.34-0.73 

0.47-2.9 

0.58-0.92 

Pharyngeal exudates 18,22,23 

1673 

0.65 

0.03-0.48 

0.76-0.99 

2.1 (1.4-3.1) 

0.90 (0.75-1.1) 

Tonsillar swelling/enlargement 18 ' 22 

2703 

0.65 

0.56-0.86 

0.56-0.86 

1.4-3.1 

0.63 (0.56-0.72) 

Tonsillar or pharyngeal exudates 15,16,19,21 

2246 

0.65 

0.28-0.61 

0.62-0.88 

1.8(1.5-2.3) 

0.74 (0.66-0.82) 

Anterior cervical nodes tender 15,16,18,22 

2280 

0.64 

0.32-0.66 

0.53-0.84 

1.2-1.9 

0.60 (0.49-0.71) 

Tonsillar exudates 20,22 

840 

0.64 

0.36 (0.21-0.52) 

0.71-0.98 

3.4 (1.8-6.0) 

0.72 (0.60-0.88) 

No cough 1549,21,23 

5122 

0.63 

0.51-0.79 

0.36-0.68 

1.1-1.7 

0.53-0.89 

No coryza 15 ' 19,22 

3846 

0.57 

0.42-0.84 

0.20-0.70 

0.86-1.6 

0.51-1.4 

Myalgias 18,21,22 

2003 

0.57 

0.49 (0.43-0.56) 

0.52-0.69 

1,4 (1.1-1.7) 

0.93 (0.86-1.0) 

History of sore throat 16,17,21,22 

3090 

0.57 

0.18-0.93 

0.09-0.86 

1.0-1.1 

0.55-1.2 

Headache 17,18,22 

2350 

0.56 

0.48 (0.42-0.53) 

0.50-0.80 

0.81-2.6 

0.55-1.1 

Pharynx injected 16,18,19,22 

2939 

0.54 

0.43-0.99 

0.03-0.62 

0.66-1.6 

0.18-6.4 

Measured temperature >38.3°C 16,22,23 

1096 

0.53 

0.22-0.58 

0.53-0.92 

0.68-3.9 

0.54-1.3 

Nausea 17,21 

1941 

0.52 

0.26 (0.12-0.43) 

0.52-0.98 

0.76-3.1 

0.91 (0.86-0.97) 

Duration <3 d 20,22 

824 

0.43 

0.26-0.93 

0.59 (0.54-0.64) 

0.72-3.5 

0.15-2.2 

Male sex 21,22 

1325 

0.39 

0.11-0.56 

0.39-0.86 

0.87(0.72-1.0) 

1.1 (0.93-1.2) 

Palatine petechiae 18,22 

1202 

NC 

0.07 (0.02-0.14) 

0.95 (0.92-0.96) 

1.4(0.48-3.1) 

0.98(0.92-1.1) 

Strep exposure previous 2 wk 18,19,22,23 

2091 

NC 

0.19(0.12-0.27) 

0.87-0.94 

1.9(1.3-2.8) 

0.92 (0.86-0.99) 

Rash 17,21,22 

2356 

NC 

0.04 (0.03-0.06) 

0.79-0.99 

0.06-35 

0.90-1.1 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NC, receiver operating characteristic curve not calculable. 

“When one of these operating characteristics was homogeneous (P> .05 for the x 2 test), the summary value and a 95% Cl are given. Where they are heterogeneous, only the 
range is given. Variables are given in the order of the area under the receiver operative characteristic curve, where one could be drawn. 


tonsillar enlargement (LR-, 0.63), and tonsillar or pharyn¬ 
geal exudate (LR-, 0.74). A physician’s overall estimate of 
the probability of strep throat was measured in 2 small 
studies, which found LR+ values of 3.0 and 1.7 and LR- 
values of 0.36 and 0.60. 28,29 

Estimating the Pretest Probability of Strep Throat 

Clinical decision-making requires an estimate of the pretest 
probability, in this case the probability that the patient has 
strep throat before examination. This estimate should be 
based primarily on the patient’s age, the clinical setting, and 
the season. 

In general, strep throat is more common in children than 
among infants and adults: group A (3-hemolytic streptococ¬ 
cal bacteria can be isolated by throat culture in 24% to 36% 
of children 1517 and in 5% to 24% of adults with sore 
throat. 4,18,19,28,30,31 Breese 32 reports that the likelihood of strep 
throat peaks between the ages of 5 and 10 years, although it 
may occur somewhat earlier now because more children are 
in daycare. 


A reasonable estimate of the pretest probability of strep 
throat in an unselected office-based adult population is 5% 
to 10% and in an unselected pediatric population, 20% to 
25% (Table 47-2). The prevalence of strep throat is also 
higher among patients treated in emergency departments or 
urgent care centers than in office practice. 33,34 Because strep 
throat is more common in autumn and winter, 19,29,32 it may be 
appropriate to adjust these estimates upward during those 
seasons and downward in spring and summer. 

Clinical Prediction Rules 

Because individual signs and symptoms are not accurate 
enough to make a diagnosis, clinical prediction rules have been 
developed that use several key elements of the medical history 
and physical examination to predict the probability of strep 
throat. Using a clinical prediction rule gives a physician a ratio¬ 
nal basis for assigning a patient to a low-risk category (requires 
neither testing nor treatment), a high-risk category (empiric 
antibiotic therapy may be indicated), or a moderate-risk cate¬ 
gory (may require further diagnostic testing). 9,10 
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Table 47-4 Clinical Prediction Rules for the Prediction of Strep Throat in Patients With Sore Throat 


Source, y 

Description 

Population 

Accuracy 

Comment 

Centor et al, 28 1981 

Simple 4-variable 
additive score 

236 US adult patients in the 
emergency department 

Area under the ROC curve 0.79 
(good accuracy) 

Successfully validated in 3 new adult 
populations 3436 

Dobbs, 29 1996 

Bayesian score with 

14 variables 

206 Patients >4 y in a British 
general practice 

71 % Sensitive, 71 % specific 

No prospective validation, relies on 
one physician’s examination skills 

Breese, 32 1977 

Algorithm based on 4 
signs and symptoms 
and patient age (see 
Figure 47-2) 

670 US children in original 
study, 892 children in validation 
study 37 

LR+ = 2, LR- = 0.75 in pediat¬ 
ric validation study 37 

The lowest-risk group still has a 6% 32 
to 16% 37 risk of strep throat; similar 
results in study of adults and 
children 21 

Clancy et al, 38 1988 

3-Item additive score 
based on medical his¬ 
tory alone 

1237 US adult patients in 2 
emergency departments and 

189 patients at a student health 
service 

Area under the ROC curve 0.70 
to 0.74, depending on setting; 
85% sensitive and 42% specific 

No prospective validation 

Hoffman, 3 1992 

7-Item score, includ¬ 
ing age 

1783 Patients in a Danish gen¬ 
eral practice 

LR+ = 1.3, LR- = 0.2 

Rule has low specificity (26%) 

Komaroff et al, 18 1986 

6-Item additive score 

693 US adult outpatients 

Results presented as nomo¬ 
gram only 

Not prospectively validated 

Walsh et al, 19 1975 

Algorithm based on 4 
signs and symptoms 
(see Figure 47-1) 

418 US adults with sore throat 
at an HMO outpatient clinic 

High risk = 28% strep; moderate 
risk = 15% strep; low risk = 4% 
strep 

In a prospective study in adults and 
children by Crawford et al, 23 23% of 
high-risk patients, 12% of moderate- 
risk patients, and 3% of low-risk 
patients had a positive throat culture 
result for strep 

Meland et al, 35 1993 

Algorithm based on 4 
signs and symptoms 

133 Norwegian adults and chil¬ 
dren with sore throat 

High risk = 62% strep; moderate 
risk = 34% strep; low risk = 

10% strep 

Not prospectively validated 

Mclssac et al, 37 2000 

Algorithm based on 4 
signs and symptoms 
and patient age 

621 Canadian adults and chil¬ 
dren >3 y old and presenting to 
family physicians 

Risk stratified into 5 levels, from 

1 %-51 % 

Good prospective validation in primary 
care practices, including both children 
and adults 


Abbreviations: HMO, health maintenance organization; LR+, positive likelihood ratio; LR-, negative likelihood ratio; ROC, receiver operating characteristic curve. 


Table 47-4 summarizes previous efforts to develop or vali¬ 
date clinical prediction rules for the diagnosis of strep throat. 
One of the best validated is a simple 4-item clinical predic¬ 
tion rule developed by Centor et al. 28 The Centor score has 
been validated in 3 distinct adult populations 34 ' 36 and consid¬ 
ers 4 signs and symptoms; tonsillar exudate, swollen tender 
anterior cervical nodes, absence of cough, and a history of 
fever. The rule is accurate, with an area under the ROC curve 
of 0.79. One point is assigned for each of the patient’s signs 
and symptoms, and the sum is used to determine the likeli¬ 
hood of strep throat (Figure 47-1). 28 The presence of 3 or 4 
findings increases the probability of strep throat. For exam¬ 
ple, a patient with a pretest probability of 10% and a Centor 
score of 4 would have a 41% probability of strep throat. 
Patients with none or 1 of the cardinal findings have a very 
low risk of strep throat, and it may be appropriate to forgo 
testing or treatment in this group. The Centor clinical pre¬ 
diction rule has not been validated in younger patients. 
Recently, Mclsaac et al 37 have modified Centor’s score and 
validated it prospectively in a mixed group of adults and chil¬ 
dren (Figure 47-2). Another rule, developed by Walsh et al, 19 
has been validated prospectively in a mixed population of 
adults and children (Figure 47-3). 


The Breese score has been prospectively validated in a 
large pediatric population. 32,38,39 However, a low Breese score 
does not rule out strep: 14% of children with a score of 20 

1. Assign 1 point for each of the following clinical 
characteristics: (1) history of fever, (2) anterior cervical 
adenopathy, (3) tonsillar exudate, and (4) absence of 
cough. 

2. Find the column that most closely matches the 
pretest probability of strep in the patient and look 
down the column to the row that matches the 
patient's number of points to determine the probability 
of strep. 

Pretest Probability of 
Strep Throat, % 


Points, Likelihood 


No. 

Ratio 

5 

10 

15 

20 

25 

40 

50 

O 

0.16 

1 

2 

2 

3 

5 

10 

14 

1 

0.3 

2 

3 

5 

7 

9 

17 

23 

2 

0.75 

4 

8 

12 

16 

20 

33 

43 

3 

2.1 

10 

19 

27 

34 

41 

58 

68 

4 

6.3 

25 

41 

53 

61 

68 

81 

86 


Figure 47-1 Centor Clinical Prediction Rule for the Diagnosis of 
Strep Throat in Adults 
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or less had a positive throat culture result. In addition, it 
requires the results of a white blood cell count, which 
may not be immediately available in many outpatient 
practices. 


1. Add Up Points for Patient 

Symptom or Sign Points 

History of fever or measured temperature > 38°C 1 

Absence of cough 1 

Tender anterior cervical adenopathy 1 

Tonsillar swelling or exudates 1 

Age < 15 y 1 

Age > 45 y -1 


2. Find Risk of Strep 

Likelihood 

Points Ratio 

-1 orO 005 

1 0.52 

2 0.95 

3 2.5 

4 or 5 4.9 


% With Strep 
(Patients With Strep/Total) 
1 (2/179) 
10(13/134) 
17(18/109) 
35(28/81) 

51 (39/77) 


Figure 47-2 Mclsaac Modification of the Centor Strep Score 

Data from a group of 167 children aged 3 years or older and 453 adults in 
Ontario, Canada. Baseline risk of strep 17% in this population. Reprinted 
with permission from Mclsaac et al. 37 


CLINICAL SCENARIOS—RESOLUTIONS 


Case 1 describes a child with a high likelihood (51%) of 
streptococcal pharyngitis according to the Mclsaac 
clinical rule (Figure 47-2). In fact, the likelihood of 
strep throat is probably even higher because of his 
recent exposure. The physician might wish to treat 
without further diagnostic confirmation. Children with 
a low or intermediate probability of strep and a nega¬ 
tive rapid antigen test result should still have a backup 
throat culture. 

In case 2, an adolescent has a pretest probability (esti¬ 
mate, 15%) falling between that of adults and children. In 
this age group, infectious mononucleosis is also a rela¬ 
tively common cause of sore throat. Assuming a pretest 
probability of 15% and 2 points on the Centor score, he 
has a 12% probability of strep throat (Figure 47-1). The 
physician should decide whether to recommend a rapid 
antigen test to clarify the need for treatment. Newer rapid 
tests have a sensitivity (85%) and specificity (93%) close 
to that of throat culture. 40 If a patient with a 12% proba¬ 
bility of strep throat has a negative rapid test result, the 
likelihood of strep decreases to only 2%, whereas if the 
results are positive, it increases to 62%. 



Figure 47-3 Walsh Algorithm for Evaluating 
Cases of Adults With Sore Throats 


Validation Results 

% Strep by Culture 

I I 

Risk Group Original 19 Validation 23 

High 28 23 

Moderate 15 12 

Low 4 3 
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Finally, the woman in case 3 has none of the cardinal 
characteristics of strep throat in the Centor score and the 
Walsh algorithm and therefore has a low (2%-3%) proba¬ 
bility of strep throat. It may be appropriate to reassure this 
patient that strep throat is unlikely and to consider non- 
bacterial causes of sore throat. Some would argue that a 
throat culture is always necessary and that treatment 
should be delayed until culture results become available. 41 
However, this approach ignores the accuracy of rapid 
antigen tests, particularly when used in tandem with an 
accurate assessment of the pretest probability with a clini¬ 
cal score. A recent study of 30000 episodes of sore throat 
found that changing from a policy encouraging throat 
culture to one encouraging the use of a rapid antigen test 
only decreased the percentage of patients with sore throat 
receiving a culture from 65% to 13%, without an adverse 
increase in suppurative complications. 42 


THE BOTTOM LINE 

This study further confirms that no single element of the 
medical history or physical examination is powerful enough 
to confirm the probability of streptococcal pharyngitis. 
Instead, physicians should consider a combination of find¬ 
ings, including tonsillar exudate, tender or enlarged anterior 
cervical nodes, the absence of cough, and a history of fever 
(or measured temperature >38°C). A rational approach to 
therapy integrates these findings with the patient’s age and 
the clinical setting, the information from Figures 47-1 and 
47-2, the results of rapid antigen testing or throat culture, 
and the clinician’s own judgment. 
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UPDATE: Streptococcal Pharyngitis 



Prepared by David L. Simel, MD, MHS 
Reviewed by Jane Kim, MD, and Kathy Vanenkevort, BS 


CLINICAL SCENARIO 


A 19-year-old college student has a severe sore throat and 
a mild fever (temperature 38.3°C), and he feels bad. The 
symptoms have been present for 4 days and initially 
started with a dry cough. There is a pharyngeal exudate, 
but only on the left side of the posterior pharynx. His neck 
reveals tender adenopathy. Should you assume he has 
streptococcal pharyngitis and start treatment? 

UPDATED SUMMARY ON 
STREPTOCOCCAL PHARYNGITIS 

Original Review 

Ebell MH, Smith MA, Barry HC, Ives KY, Carey M. Does this 
patient have strep throat? JAMA. 2000;284(22):2912-2918. 

UPDATED LITERATURE SEARCH 

Our literature search used the parent search for The Rational 
Clinical Examination series, combined with the search term 
“pharyngitis,” for the years 2000 to August 2005. Because the 
original publication supported the use of the Centor score, 
we reviewed studies that further explored and validated the 
use of this score in patient populations that included adults, 
rather than children alone, with pharyngitis. We identified 27 
potentially relevant articles for further review, of which 6 
warranted closer assessments. Three of those studies 1 ' 3 vali¬ 
dated the impression that the individual symptoms and signs 
for streptococcal pharyngitis are not diagnostically useful, so 
multivariate models that include combinations of findings 
must be used. However, none of these studies included pro¬ 
spective model validation. 

NEW FINDINGS 

• The Centor score alone is inadequate for making a correct 
diagnosis in patients with 2 or more symptoms. 

• The clinical examination improves markedly when the 
Centor score is used to identify patients for rapid strepto¬ 
coccal tests. 


• Treatment decisions based on the Centor score, without 
rapid testing, depend more on the prevalence of disease 
and benefit/risk of treatment rather than useful likeli¬ 
hood ratios (LRs). 

Details of the Update 

A study of Israeli adults older than 16 years and with phar¬ 
yngitis provided a unique patient sample by including those 
with a Centor score of only 0 to l. 2 Most other studies 
exclude these patients from their analysis, focusing only on 
those with at least 2 of 4 symptoms. However, 38% of the 
patients had a mild pharyngitis presentation, with a 0 to 1 
score; the LR for such a score is 0.16 and only 5% of patients 
with a score of 0 to 1 will have a positive culture result. 
When the investigators created a multivariable model (7 
symptoms and signs) for this patient group that included a 
broad spectrum of disease, only pharyngeal exudates 
remained significant at predicting culture positivity (posi¬ 
tive LR, 1.8; 95% confidence interval [Cl], 1.5-2.2; negative 
LR, 0.27; 95% Cl, 0.13-0.53). 

The Centor score may not work equally well for chil¬ 
dren. To evaluate this, Mclsaac et al 4 assembled a popula¬ 
tion of patients aged 3 to 69 years, and they validated their 
modified Centor score. The score was modified by age, 
with points added or subtracted as follows: aged 3 to 14 
years, add 1 point; aged 15 to 44 years, add 0 points; aged 
45 years or older, subtract 1 point. After adjusting for age, 
the investigators compared the modified Centor score to 
culture for those patients with a score greater than or equal 
to 2. The prevalence of disease was 22% for adults but 34% 
for children younger than 18 years. The modified Centor 
score did not appreciably change the LR for adults, but it 
did have an effect on children. At high scores, the LRs dif¬ 
fer, with an LR of 1.6 (95% Cl, 0.5-5.0) for a modified 
Centor score of 4 to 5 in adults that improves to an LR of 
4.0 (95% Cl, 2.7-6.0) for children younger than 18 years. 
After evaluating a variety of treatment strategies, the 
authors reported predictive values based on the modified 
Centor scores ( able 47-5). 

A pragmatic study assessed the use of the Centor score in 
adults (>18 years) to identify patients for point-of-care test¬ 
ing and throat cultures. 5 The study is retrospective, so we 
cannot determine the number of patients evaluated for 
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Table 47-5 Modified Centor Scores 



Centor Score, Adults 
(>18 y) a 

Modified Centor Score, 
Children (3-17 y) 


LR (95% Cl) 

LR (95% Cl) 

Score 4 

1.2(0.62-2.2) 

4.0 (2.7-6.0) 

Score 2-3 

1.3 (0.85-1.9) 

0.69 (0.59-0.83) b 

Score 0-1 

0.26(0.14-0.48) 

Uncertain 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 
“Summary LR from Mclsaac et al , 1 Atlas et al , 5 and Chazan et al . 2 
“Summary LR from Mclsaac et al . 4 


Table 47-6 Positive Predictive Value of Treatment 
According to the Modified Centor Score 


Modified 

Centor 

Score 

Antibiotic 

Treatment Strategy 

Positive Predictive 
Value of Decision 
to Treat, % 
(95% Cl) 

Negative Predictive 
Value of Decision 
to Not Treat, % 
(95% Cl) 

Adults (>18 y) a 

4 

Treat with antibiotics 

84 (73-90) 

94 (90-96) 

2-3 

Rapid test; treat if 
positive result 



Children (3-17 y) 

2-5 (All chil¬ 
dren with 
sore throat) 

All get rapid test; treat 
for a positive rapid 
test result and culture 
those with a negative 
rapid test result 

98 (94-99) 

100(98-100) 


Abbreviation: Cl, confidence interval. 
“Mclsaac et al . 4 


acute pharyngitis who did not undergo point-of-care testing. 
However, prevalence of pharyngitis is similar to that in 
other published studies. As in other studies, the Centor 
score alone had an LR of 1.5 (95% Cl, 1.2-1.9) for patients 
with at least 2 findings, and for those with 0 to 1 finding the 
LR was 0.35 (95% Cl, 0.16-0.75). The value of point-of-care 
testing highlighted its utility when combined with the Cen¬ 
tor score in both ruling in and ruling out disease. A positive 
rapid streptococcal test result in patients with at least 2 
Centor score findings had an LR of 179; the Centor score 
did not affect the LR when the point-of-care testing result 
was negative, because the LR was 0.09 for a broad range of 0 
to 4 symptoms. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

These newer studies confirm the utility of the Centor score for 
certain patients while highlighting the weaknesses. The disease 
prevalence is higher in children (<18 years), and an age-adjusted 
score improves the performance of the Centor score. Point-of- 
care testing, in combination with the Centor score, is valuable. 

CHANGES IN THE REFERENCE STANDARD 

The throat culture continues to be the recognized reference 
standard for the diagnosis of group A [3-hemolytic strepto¬ 
cocci. However, a positive throat culture or rapid antigen test 
result provides adequate confirmation of the presence of 
group A (3-hemolytic streptococci in the pharynx and is 
accepted as a pragmatic reference standard. 

RESULTS OF LITERATURE REVIEW 

The modified Centor score can direct the antibiotic treat¬ 
ment strategy (Table 47-6). 

EVIDENCE FROM GUIDELINES 

The diagnosis of acute group A streptococcal pharyngitis 
should be suspected on clinical and epidemiologic grounds 
and then supported by performance of a laboratory test. 6 
However, streptococcal pharyngitis is not the etiology of 
most cases of pharyngitis, so antibiotics are usually unneces¬ 
sary. This is especially important, given the increasing con¬ 
cerns about antibiotic resistance. For adults, empiric treatment 
is not recommended for patients with a Centor score less 
than or equal to 3. For individuals with 2 or more symptoms, 
rapid antigen testing should guide treatment. 


CLINICAL SCENARIO—RESOLUTION 


This young college student has what initially seemed like a 
viral illness, heralded by a dry cough. However, his symp¬ 
toms seem to have progressed relatively quickly, and he 
now has a Centor score of 3 (fever, exudates, and tender 
cervical adenopathy). Although this seems like streptococ¬ 
cal pharyngitis, he could also have mononucleosis or a 
variety of other infectious etiologies. The Centor score 
confers an LR not much different from 1, but the results 
of a rapid streptococcal test would ensure the diagnosis. 
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STREPTOCOCCAL PHARYNGITIS—MAKE THE DIAGNOSIS 


No matter what the patient’s age, most cases of pharyngitis 
will not be attributable to streptococcus. During the general 
physical examination, clinicians should consider performing 
a throat culture or rapid antigen test, but only in tandem 
with the Centor score. None of the univariate signs or symp¬ 
toms associated with pharyngitis has high enough sensitivity 
and specificity for diagnosis according to clinical grounds 
alone. The greatest utility for the Centor score is in identify¬ 
ing patients for whom a throat culture or rapid streptococcal 
test should be performed because the score itself is not suffi¬ 
cient for confirming a diagnosis of streptococcal pharyngitis. 

PRIOR PROBABILITY 

The prevalence of streptococcal pharyngitis is higher in chil¬ 
dren than among infants and adults: group A (3-hemolytic 
streptococcal bacteria can be isolated by throat culture in 
24% to 36% of children and in 5% to 24% of adults with 
sore throat. Streptococcal pharyngitis is also more common 
in autumn and winter; thus, it may be appropriate to adjust 
the pretest probability upward during those seasons. 

POPULATION FOR WHOM STREPTOCOCCAL 
PHARYNGITIS SHOULD BE CONSIDERED 

• Children and adults with sore throat. 

DETECTING THE LIKELIHOOD OF 
STREPTOCOCCAL PHARYNGITIS 

The Centor score and modified Centor score perform differ¬ 
ently for younger vs older patients (Table 47-7). The Centor 
score improves greatly when combined with rapid strep test 
results (Table 47-8). 


Table 47-7 Likelihood Ratios for Centor Scores as a Function of Age 

Score 

LR (95% Cl) 

Adults >18 y 

Centor score 2-4 

=1 

Centor score 0-1 

0.26(0.14-0.48) 

Children 3-17 y 

Modified Centor score, 4-5 

4.0 (2.7-6.0) 

Modified Centor score, 2-3 

0.69 (0.59-0.83) 

Modified Centor score, 0-1 

Uncertain 

Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


Table 47-8 Centor Score Combined With Rapid Strep Point-of-Care 


Test Results, Adults 


Centor Score 

Point-of-Care Test Result 

LR (95% Cl) 

2-4 Findings 

Positive 

179 (110-2861) 

0-1 

Positive 

26 (1.4-465) 

0-4 

Negative 

0.09 (0.03-0.24) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 


REFERENCE STANDARD TESTS 

Streptococcal throat culture, rapid streptococcal antigen 
tests. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Streptococcal Pharyngitis 



TITLE The Role of Point of Care Testing for Patients 
With Acute Pharyngitis. 

AUTHORS Atlas SJ, McDermott SM, Mannone C, 
Barry MJ. 

CITATION / Gen Intern Med. 2005;20(8):759-761. 

QUESTION Does point-of-care (POC) testing with a 
rapid test for group A [3-hemolytic streptococcus improve 
the performance of the Centor score? 

DESIGN Prospective, nonconsecutive study in which 
every patient who had POC testing also had a throat culture. 

SETTING Two primary care practices with data col¬ 
lected during a 12-month period. 

PATIENTS Adults (>18 years) with symptoms of acute 
pharyngitis. Patients were excluded if their symptom 
duration was greater than 7 days, they had taken antibi¬ 
otics within the past 24 hours, they were immunocompro¬ 
mised, or they had an acute pulmonary disease flare-up. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The providers collected information for the Centor score and 
collected the samples for POC tests and throat cultures. 


Thirty-eight patients (26%) had group A (3-hemolytic 
streptococcus by culture. The LRs for the Centor scores 
improved greatly when combined with the rapid strep test 

( ibles 47-9 and 47-10). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS Pragmatic study that took advantage of clini¬ 
cal decisions to perform POC testing, informed by the Cen¬ 
tor score, vs the culture reference standard. 

LIMITATIONS Nonconsecutive patients, so we do not know 
how many patients were evaluated for acute pharyngitis 
without POC testing. 


Table 47-9 Likelihood Ratios for Centor Scores 


Centor Score 

n 

LR+ (95% Cl) 


4 

18 

1.4 


3 

30 

1.4 


2 

45 

1.6 


0-1“ 

55 

0.35 


Collapsed data 

2-4 Findings 


1.5 (1.2-1.9) 


0-1 


0.35(0.16-0.75) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 
“Only 1 patient had 0 findings. 


MAIN OUTCOME MEASURES 

Sensitivity, specificity, and likelihood ratio (LR) of the Centor 
score and POC tests. 

Centor score = history of fever (temperature >38°C), ton¬ 
sillar exudates, swollen anterior cervical lymph nodes, 
absence of cough (patient report). 

A positive response to each finding is given 1 point, so a 
maximum Centor score is 4. 

MAIN RESULTS 

The authors completed data forms on 179 patients. They 
excluded 29 according to their criteria and had a final sample 
size of 148 after eliminating 2 patients with incomplete data. 


Table 47-10 Likelihood Ratios for Rapid Strep Tests as a Function of 
Centor Scores 

Centor Score 

POC 

N 

LR+ (95% Cl) 

2-4 

Positive 

31 

179 

0-1 

Positive 

4 

26 

2-4 

Negative 

62 

0.07 

0-1 

Negative 

51 

0.14 

Collapsed data 

2-4 Findings 

Positive 


179 (110-2861) 

0-1 

Positive 


26 (1.4-465) 

0-4 

Negative 


0.09 (0.03-0.24) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; POC, point-of- 
care rapid strep test. 
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Because not all patients with acute pharyngitis were 
enrolled, it is possible that the providers selected patients 
for whom the diagnosis was not clear, making the Centor 
score perform with lower efficiency. However, the preva¬ 
lence of streptococcal pharyngitis in this group is compara¬ 
ble to that in other studies. The Centor score did not 
perform well for identifying affected patients, but the pres¬ 
ence of no more than 1 finding reduces the prior probabil¬ 
ity of 25% to 10%. The Centor score alone was not 
adequate for making a diagnosis. 

The power of POC testing is highlighted by the results. A 
Centor score modulates the LR in that individuals with a 
score of 2 to 4 and a positive POC test result have an even 
higher LR than those with a score of 0 to 1. At the lower end 
of prior probability of group A (3-hemolytic streptococcus by 
culture in adults with acute pharyngitis (10%), the probabil¬ 
ity of disease with a Centor score of 0 to 1 and a positive POC 
test result increased to 74%. A negative POC test result, at 
any level of Centor score, effectively ruled out group A (3- 
hemolytic streptococcus by culture, with a posttest probabil¬ 
ity of less than 2%. 

Reviewed by David L. Simel, MD, MHS 


TITLE Clinical Predictors of Streptococcal Pharyngitis 
in Adults. 

AUTHORS Chazan B, Shaabi M, Bishara E, Colodner R, 
Raz R. 

CITATION Isr Med Assoc J. 2003;5(6):413-415. 

QUESTION What are the clinical features that predict 
sore throat caused by group A (3-hemolytic streptococcus? 

DESIGN Consecutive patients with sore throat. 

SETTING Israeli primary care clinics during 4 consecu¬ 
tive winter months. 

PATIENTS Adults (>16 years) with sore throat. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The clinical symptoms were recorded and a throat swab for 
culture was obtained. Rapid testing was not done. 


Table 47-11 

Likelihood Ratios for Centor Scores 

Centor Score 

Total (%) 

LR+ (95% Cl) 

4 

14(5) 

0.51 (0.12-2.2) 

2-3 

112(55) 

2.0(1.6-2.4) 

0-1 

78 (40) 

0.16(0.06-0.43) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 


MAIN RESULTS 

A total of 207 patients were enrolled, with only 3 dropped 
because of missing data; 24% of patients had a positive throat 
culture result. 

A multivariate analysis of 7 symptoms and 10 signs showed 
that only pharyngeal exudate was significantly predictive of a 
positive culture result (positive likelihood ratio [LR], 1.8; 
95% confidence interval [Cl], 1.5-2.2; negative LR, 0.27; 95% 
Cl, 0.13-0.53). A low Centor score made strep throat much 
less likely ( ble47-l ). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS Consecutive adults, including those who had 
a Centor score of 0 to 1. These patients are missing in most 
other validation studies. 

LIMITATIONS The study had few patients who had all 4 
findings of the Centor; thus, the results are unstable for this 
group. 

This prospective study allows us to assess the usefulness of 
a Centor score of 0 to 1. Only 5% of patients with 1 or no 
symptoms had a positive culture result. Given that the proba¬ 
bility of a positive culture result in this group was at the 
upper range of probabilities observed in adults, a Centor 
score of 0 to 1 decreases the probability of strep throat to less 
than 5% for most adults. The data support the recommenda¬ 
tion for obtaining a rapid test for those with a score of 2 to 3, 
rather than treating empirically, because the LR was only 2. 
There were so few patients with a Centor score of 4 that the 
results are not useful for making conclusions about this 
group of patients. 

The multivariate analysis selected exudates as the only use¬ 
ful sign or symptom. This finding suggests that the presence 
of exudates in the Centor score might be dominated by the 
results of this single finding when a population of all patients 
with sore throats is included, rather than just those with 
more than 1 Centor finding. 

Reviewed by David L. Simel, MD, MHS 


MAIN OUTCOME MEASURES 

A multivariate model was used to analyze the independent 
symptoms and signs. The outcome of Centor score vs the 
culture was reported. 
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TITLE Empirical Validation of Guidelines for the Man¬ 
agement of Pharyngitis in Children and Adults. 

AUTHORS Mclsaac WJ, Kellner JD, Aufricht P, Vanjaka 
A, Low DE. 

CITATION JAMA. 2004;291(13):1587-1595. 

QUESTIONS What is the likelihood of a group A (3- 
hemolytic streptococcus culture according to a modified 
Centor score adjusted for patient age? Among a modified 
Centor score (adjusted for age) ( Box 4 ), rapid tests, and 

the throat culture (the reference standard), which single or 
combined approach results in the most correct treatment 
decisions with the fewest rapid tests and cultures? 

DESIGN The data were collected prospectively during a 
3-year study period, and then the strategies were analyzed 
retrospectively. 

SETTING Family practice clinic in Canada. 

PATIENTS Patients with a chief complaint of sore 
throat, who ranged in age from 3 to 69 years. Patients 
were enrolled if they had a modified Centor score of 2 or 
greater and the physician or study nurse believed that a 
throat swab was necessary. 


DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Various treatment strategies that included the modified Centor 
score alone, rapid flu tests, or culture were assessed retrospec¬ 
tively to determine whether the strategy led to unnecessary tests 
or antibiotics. See Box 47-1. 


MAIN OUTCOME MEASURES 

The frequency of culture positivity as a function of the modi¬ 
fied Centor score allowed us to calculate likelihood ratios 
(LRs) for the score in predicting culture positivity for group 
A p-hemolytic streptococcus. We transformed the sensitivity 
and specificity of each strategy to LRs and predictive values. 

Box 47-1 Modified Centor Score (Range 0 to 5) 

History of fever (temperature >38°C), tonsillar exudates, 
swollen anterior cervical lymph nodes, absence of cough 
(patient report) 

(A positive response to each finding is given 1 point, and 
then modified by age) 

Age Modification for Age 

3-14 y +1 

15-44y 0 

>45 y -1 


Because a culture is the reference standard test, we focused 
only on strategies that used initial combinations of the mod¬ 
ified Centor Score or rapid tests, rather than strategies that 
went straight to culture. 

MAIN RESULTS 

A total of 918 patients were screened, with complete data 
available for 787 patients. Among the 333 adults, the prev¬ 
alence of disease was 22%. The children had a prevalence 
of 34%. The modified Centor score performed differently, 
depending on the patient’s age (Table 47-12). Treatment 
could be guided by the score ( ). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 3. 

STRENGTHS This is a large study, conducted during a 3- 
year study period. A large distribution of patient ages helps 
us evaluate the generalizability of results. 


Table 47-12 Likelihood Ratios of Modified Centor Scores as a 

Function of Age 


Modified Centor Score 

LR+ (95% Cl) 

Adults (>18 y) 

4-5 

1.6 (0.5-5.0) 

3 

1.3 (1.1-1.6) 

2 

0.53 (0.34-0.82) 

Children (3-17 y) 

4-5 

4.0 (2.7-6.0) 

3 

0.73(0.61-0.88) 

2 

0.50(0.31-0.80) 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 


Table 47-13 Management Strategy for Patients With a Modified Centor 
Score Greater Than or Equal to 2 


Negative 

Positive Predictive Value 
Antibiotic Predictive Value of Decision to 
Modified Treatment of Decision to Not Treat, % 

Centor Score Strategy Treat, % (95% Cl) (95% Cl) 


Adults (>18 y) 

4 

Treat with antibi- 84 (73-90) 

otics 

94 (90-96) 

2-3 

Rapid test, treat if 
positive result 


Children (3-17 y) 

2-5 (All chil¬ 
dren with sore 
throat) 

All get rapid test; 98 (94-99) 

treat for a positive 

rapid test result and 

culture for those 

with a negative rapid 

test result 

100(98-100) 


Abbreviation: Cl, confidence interval. 
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LIMITATIONS The study entrance criteria required that the 
physician or nurse determine that a throat swab was war¬ 
ranted. Although data were not given on the number of eligi¬ 
ble patients who were not enrolled, clinicians should 
understand that these patients had a chief complaint of sore 
throat. The inference is that patients with sore throat who 
were more concerned about other symptoms (eg, fever or 
nasal congestion) were not enrolled. Furthermore, adults 
who had a sore throat but only 1 symptom would not have 
been included, because they had a modified Centor score of 0 
to 1. The treatment strategies were not studied prospectively 
but instead were evaluated after the data were collected. 

The modified Centor score (adjusted for age) did not work 
much better than the original Centor score for adults. For 
children, the LR of a modified Centor score of 4 to 5 
increases the likelihood of group A streptococcus 4-fold, but 
current treatment recommendations require a rapid test for 
all children and cultures for those with negative results. 1 ' 2 
This strategy leads to almost 100% accuracy for treatment 
decisions in children with sore throats. 


For adults, the data apply only to patients with a modified 
Centor score of at least 2. Those patients with all 4 symptoms 
can be treated empirically with antibiotics, and those with a 
score of 2 to 3 can have treatment guided by a rapid test. 
With this strategy, about 16% of treated patients will not 
have group A (3-hemolytic streptococcus, whereas 6% of 
patients with infection will not be treated. The only way to 
eliminate the 6% of patients who go untreated would be to 
use a strategy that required culture whenever the rapid strep 
test result is negative. 

Reviewed by David L. Simel, MD, MHS 

REFERENCES FOR THE EVIDENCE 

1. Bisno AL, Gerber MA, Gwaltney JM Jr, Kaplan EL, Schwartz RH. Prac¬ 
tice guidelines for the diagnosis and management of group A streptococ¬ 
cal pharyngitis. Clin InfectDis. 2002;35(2):113-125. 

2. Cooper RJ, Hoffman JR, Bartlett JG, et al. Principles of appropriate anti¬ 
biotic use for acute pharyngitis in adults: background. Ann Intern Med. 
2001;134(6):509-517. 
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CHAPTER 


Is This Patient Having a 

Stroke? 


Larry B. Goldstein, MD 
David L. Simel, MD, MHS 


CLINICAL SCENARIO 


The wife of a 58-year-old right-handed man calls emer¬ 
gency medical services because her husband abruptly 
developed difficulty speaking and moving his right arm. 
Figure 48-1 presents the diagnostic flow of a patient who 
experiences neurologic symptoms that suggest a stroke. 


WHY IS THE CLINICAL EXAMINATION 
OF PATIENTS WITH SUSPECTED 
STROKE IMPORTANT? 


Since the original review of stroke published as part of The 
Rational Clinical Examination series more than a decade ago, 
much has changed. 1 What has not changed is the staggering cost 
of the personal, societal, and economic consequences of strokes. 
The estimated direct and indirect cost of stroke in 2005 is $56.8 
billion in the United States alone. 2 More than 700000 people in 
the United States have a stroke each year, of which nearly one- 
third represent recurrent events. 1 About 163000 annual stroke 
deaths make it the third leading cause of death in the United 
States. Between 15% and 30% of stroke survivors become per¬ 
manently disabled, whereas 20% remain in institutional care 3 
months after their stroke. Not too long ago, the clinical exami¬ 
nation functioned primarily to catalog a patient’s neurologic 
impairments that in turn correlated with the stroke’s vascular 
territory and likely cause. The inferences about the anatomy and 
etiology guided secondary preventive strategies and established 
the prognosis, rather than directing immediate treatment. 

Despite the advent of modern noninvasive neuroimaging 
technologies, the clinical examination for stroke is more 
important than ever because therapeutic interventions for 
patients with acute stroke and sophisticated approaches to pre¬ 
vent recurrent strokes now exist. Appropriate treatment and 
prevention depend on accurate interpretation of the patient’s 
symptoms and clinical examination findings. For example, the 
risk/benefit balance for carotid endarterectomy requires an 
accurate assessment of symptoms to identify those with a tran¬ 
sient ischemic attack (TIA) or nondisabling stroke. 4 

The rapid screening of patients with neurologic symptoms 
begins with prehospital care personnel 5 because the effective¬ 
ness of reperfusion strategies for acute ischemic stroke are 
time dependent. The brain can withstand profound ischemia 
for only limited periods, and the benefits of intravenous tis¬ 
sue plasminogen activator (tPA) lessens as the time from the 
onset of the patient’s symptoms increases. 6 Public education 
programs have stressed the need to call emergency medical 
responders (eg, 911) for persons experiencing stroke symp¬ 
toms. Patients, family members, and prehospital care person¬ 
nel such as emergency medical technicians must recognize 
the symptoms and signs of strokes to minimize treatment 
delays. Arrival to the hospital by emergency medical trans¬ 
port has been associated with more rapid treatment and 
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CHAPTER 48 


The Rational Clinical Examination 


Onset of stroke symptoms 


Activate emergency response system 


Perform prehospital assessment 
(Emergency medical technician/paramedic) 


Prior probability of stroke 
CPSS a 

Any item present, LR=5.5 
0 items present, LR=0.39 
LAPSS b 
Positive LR=31 
Negative LR=0.09 



Perform rapid evaluation on arrival in emergency department 
Assess factors associated with increased likelihood of stroke 

1. Focal neurologic deficit 

2. Persistent neurologic deficit 

3. Acute onset during previous week 

4. No history of head trauma 



Assess stroke severity with NIH Stroke Scale 
Perform neuroimaging 

Perform laboratory tests to exclude stroke mimics 


Begin stroke treatment 

Establish prognosis according to clinical findings 


Figure 48-1 Diagnostic Flow of a Patient Who 
Experiences Neurologic Symptoms That 
Suggest a Stroke 

“Cincinnati Prehospital Stroke Scale (CPSS); facial 
droop, arm drift, and abnormal speech. b Los Angeles 
Prehospital Stroke Scale (LAPSS); medical history (age 
>45 y, no history of seizures, symptoms < 24 h, not 
wheelchair bound), blood glucose 60-400 mg/dL 
(3.3-22 mmol/L), and examination showing unilateral 
facial weakness, grip weakness, and arm weakness. 
Abbreviations: LR, likelihood ratio; NIH, National Insti¬ 
tutes of Health. 


thereby presumably improved outcomes. 710 Thus, the accu¬ 
racy of the clinical examination becomes relevant not just for 
stroke specialists and emergency physicians but also for para¬ 
medics, nursing personnel, and emergency medical techni¬ 
cians who may be the first responders. When patients with 
stroke symptoms arrive at the hospital, a standardized neuro¬ 
logic examination, combined with neuroimaging results, 
determines subgroups of patients who might benefit from 


intravenous thrombolysis vs those who may be at increased 
risk from thrombolytic-related bleeding. 1113 

Experienced examiners tailor the neurologic examination 
to address specific clinical questions because a stroke pro¬ 
duces different symptoms and signs, depending on the area 
of affected brain. A variety of other conditions complicate 
diagnostic efforts by causing symptoms and signs similar to 
stroke (stroke mimics). In the patient example, emergency 























































CHAPTER 48 Stroke 


medical services were called for a patient with new focal neu¬ 
rologic symptoms. We will observe the example patient 
through the emergency evaluation and highlight the clinical 
questions and features of the examination that increase the 
likelihood of accurately and reliably identifying a stroke, the 
stroke subtype, and the patient’s prognosis. 


METHODS 

This review updates a 1994 report on clinical assessment of 
stroke 1 and is based on relevant studies identified through 
MEDLINE, restricted to the time since the last review. Infor¬ 
mation on the physical examination and neurologic exami¬ 
nation is difficult to identify because the Medical Subject 
Eleadings for the articles typically do not include obvious 
terms. For example, searching the terms “cerebrovascular dis¬ 
orders” limited to human research studies, English-language 
articles (1994-2005) yields 9029 articles. Elowever, when the 
results of this global search are crossed with the term “neuro¬ 
logical examination,” there are 176 articles, and when crossed 
with “physical examination,” only 19 articles remain. Elimi¬ 
nating review articles and case reports from this reduced set 
left only 4 potentially relevant articles. Because of the low 
yield, we relied heavily on searches of the bibliographies of 
textbook chapters, review articles, and personal files to iden¬ 
tify additional relevant literature for updating the role of the 
clinical examination since the original Rational Clinical 
Examination article on stroke in 1994. 

To examine the accuracy and reliability of the clinical assess¬ 
ment of stroke for either diagnosis or prognosis, the following 


general inclusion criteria were used in assessing articles: (1) the 
article addressed the issue of accuracy or reliability of medical 
history or physical examination for diagnosis or estimation of 
short-term prognosis (mortality or functional disability); 
(2) the study site or participants (clinicians or patients) were 
described; (3) the data were not limited to case reports or 
reviews of other studies; and (4) the primary data or appropri¬ 
ate summary statistics were presented. 

For assessment of the accuracy of diagnosis, references 
included articles that also described a final diagnosis estab¬ 
lished by an expert who reviewed all clinical data, neuroim¬ 
aging, and other relevant laboratory tests. These articles were 
evaluated for quality according to whether the clinical exam¬ 
ination was performed masked to the neuroimaging results 
(see Table 1-7). 14 Articles describing prognosis in terms of 
functional status were included if the outcome was measured 
with a scale that is either comparable to a scale in common 
use or was validated in the context of the study. 

The sensitivity (how often a diagnostic procedure detects a 
condition when it is present), specificity (how often a diag¬ 
nostic procedure result is negative when the condition is 
absent), and likelihood ratios (LRs) (the odds favoring the 
diagnosis or outcome vs not having the diagnosis) for each 
finding or scale were recorded from each article or were cal¬ 
culated according to primary data as necessary. 15 ’ 16 Table 48-1 
summarizes the included studies that gave sensitivity and 
specificity data for the diagnosis of stroke or TIA. For studies 
of precision, the K statistic (describes the agreement between 
paired observers beyond that predicted by chance) or the 
intraclass correlation coefficient (when there are more than 2 
examiners) is given. Intraclass coefficients range from 0 to 1, 


Table 48-1 Summary of Included Studies With Sensitivity/Specificity Data 


Source, y 

Level of 
Evidence 3 

Country 

Setting 

No. of 
Participants 

Inclusion Criteria 

Kothari et al, 17 1997 

2 

United States 

ED 

299 

Clinical trial and ED patients 

Kothari et al, 18 1999 

3 

United States 

ED and neurology service 

171 

Suspected stroke or stroke mimic 

Kidwell et al, 19 2000 

1 

United States 

Field and ED 

441 

Suspected stroke 

Karanjia et al, 20 1997 

2 

United States 

Neurology clinics 

381 

Stroke, TIA, or other neurologic 
condition 

von Arbin et al, 21 1980 

3 

Sweden 

Hospital 

2252 

Medical admissions 

von Arbin et al, 22 1981 

3 

Sweden 

Stroke unit 

206 

Stroke unit admission 

Panzer et al, 23 1985 

2 

United States 

Hospital 

369 

Suspected stroke 

Oxbury et al, 24 1975 

3 

United Kingdom 

Hospital 

93 

Stroke 

Tuthill et al, 25 1969 

3 

United States 

Stroke unit/community hospital 

202 

Suspected stroke 

Frithz and Werner, 26 1976 

3 

Sweden 

Hospital 

344 

Stroke, <70 y 

Allen, 27 1984 

3 

United Kingdom 

Hospital 

148 

Stroke, <76 y 

Henley et al, 28 1988 

2 

United Kingdom 

Hospital 

172 

Stroke 

Fullerton et al, 29 1988 

3 

Ireland 

Hospital 

206 

Acute stroke 

Britton et al, 30 1980 

2 

Sweden 

Stroke unit 

200 

Suspected stroke 


Abbreviations: ED, emergency department; TIA, transient ischemic attack. 
“See Table 1 -7 for a description of Evidence Grades and Levels. 
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Table 48-2 The National Institutes of Health Stroke Scale 3 

Item 

Response 6 

la. Level of consciousness 

0 = Alert 


1 = Not alert 


2 = Obtunded 


3 = Unresponsive 

1b. Level of consciousness 
questions 

0 = Answers both correctly 

1 = Answers 1 correctly 


2 = Answers neither correctly 

1c. Level of consciousness 
commands 

0 = Performs both tasks correctly 

1 = Performs 1 task correctly 


2 = Performs neither task 

2. Gaze 

0 = Normal 


1 = Partial gaze palsy 


2 = Total gaze palsy 

3. Visual fields 

0 = No visual loss 


1 = Partial hemianopsia 


2 = Complete hemianopsia 


3 = Bilateral hemianopsia 

4. Facial palsy 

0 = Normal 


1 = Minor paralysis 


2 = Partial paralysis 


3 = Complete paralysis 

5. Motor arm 

0 = No drift 

a. Left 

1 = Drift before 5 s 

b. Right 

2 = Falls before 10 s 


3 = No effort against gravity 


4 = No movement 

6. Motor leg 

0 = No drift 

a. Left 

1 = Drift before 5 s 

b. Right 

2 = Falls before 5 s 


3 = No effort against gravity 


4 = No movement 

7. Ataxia 

0 = Absent 


1 = One limb 


2 = Two limbs 

8. Sensory 

0 = Normal 


1 = Mild loss 


2 = Severe loss 

9.Language 

0 = Normal 


1 = Mild aphasia 


2 = Severe aphasia 


3 = Mute or global aphasia 

10. Dysarthria 

0 = Normal 


1 = Mild 


2 = Severe 

11. Extinction/inattention 

0 = Normal 


1 = Mild 


2 = Severe 

“The actual form for recording the data contains detailed instructions for the use of the 
scale. Available at http://www.ninds.nih.gov/doctors/NIH_Stroke_Scale.pdf (accessed 
June 13,2008). An online course for provider education is available at http:// 


www.ninds.nih.gov/doctors/stroke_scale_training.htm (accessed June 13,2008). 
"Score = sum of scores from each item. 


with 0 indicating random agreement and 1 indicating perfect 
agreement. Random-effects estimates were used for the LR 
summary measures. 

RESULTS 

Prehospital Assessment 

Accuracy 

According to a prospective observational cohort study, when 
examination was performed by a physician, the presence of any 
of 3 physical examination findings (facial paresis, arm drift, 
and abnormal speech) was selected from the National Insti¬ 
tutes of Health Stroke Scale (NIHSS) as the most useful. These 
3 items, selected by statistical recursive partitioning techniques, 
identified patients with stroke with 100% sensitivity (lower 
95% confidence limit, 95%) and 88% specificity (95% confi¬ 
dence interval [Cl], 82%-91%) (positive LR [LR+], 7.9; 95% 
Cl, 5.6-11; negative LR [LR-], 0; 95% Cl, 0-0.12), although the 
sensitivity decreased to 66%, with a similar specificity when 
this instrument was validated in the hospital setting. 17 Several 
schemes facilitate the rapid, accurate identification of stroke 
patients by emergency medical personnel. 

The Cincinnati Prehospital Stroke Scale (CPSS) uses the 3 
most important items (facial paresis, arm drift, and abnormal 
speech) derived from the NIHSS (Table 48-2), 17 In a prospec¬ 
tive study, one of 2 emergency physicians certified in the use 
of the full NIHSS evaluated 171 patients (selected by a neurol¬ 
ogist from either the emergency department or inpatient neu¬ 
rology service) with chief symptoms that suggested a stroke. 18 
The examining physicians were aware of the patient’s chief 
report but not the presenting clinical signs or final diagnosis. 
Each patient also had separate examinations by 4 of 24 emer¬ 
gency medical personnel, masked to all the clinical data. 
According to data provided in the article, we calculated the 
LRs for increasing numbers of findings (0-3) for the physi¬ 
cians (Table 48-3). The same calculations can be done for the 
emergency medicine personnel, although the CIs are over¬ 
stated because the findings are presented for the total number 
of examinations rather than unique patients. Nonetheless, the 
diagnostic accuracy for the emergency department physician 
compared with the emergency medical personnel was identi¬ 
cal, with the area under each receiver operating characteristic 
= 0.88. The presence of any single finding of the 3 created a 
sharp increase in the likelihood of stroke. After collapsing the 
data at a threshold of greater than or equal to 1 finding vs 0 
findings, the physician had an LR of greater than or equal to 1 
finding = 5.5 (95% Cl, 3.3-9.1) and an LR of 0 findings = 0.39 
(95% Cl, 0.25-0.61); the emergency medical personnel had an 
LR of greater than or equal to 1 finding = 5.4 (95% Cl, 4.1-7.0) 
and an LR of 0 findings = 0.46 (95% Cl, 0.38-0.56). Although 
this study did not evaluate the emergency medical personnel’s 
diagnostic accuracy according to examinations performed in 
the field, this method of identifying patients with acute stroke 
is being widely used throughout the country and can be per¬ 
formed in less than a minute. 

The Los Angeles Prehospital Stroke Screen (LAPSS) assesses 
for a unilateral arm drift, handgrip strength, and facial paresis. 19 
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The screen was evaluated prospectively on all noncomatose, 
nontrauma patients with neurologic complaints compatible 
with stroke, who were transported by emergency medical tech¬ 
nicians to a single hospital. The relevant neurologic signs were 
altered consciousness, focal neurologic signs, seizure, syncope, 
head pain, or a cluster category of weakness/dizziness/sick. 
The criteria for an in-the-field stroke diagnosis by the emer¬ 
gency medical technician were met when the patients were 
older than 45 years, had no seizure history, had symptoms for 
fewer than 24 hours, were not wheelchair bound or bedridden, 
had a blood glucose level between 60 and 400 mg/dL (3.3 and 
22 mmol/L), and a unilateral deficit in one of the 3 findings 
previously listed. A reviewer, masked to the emergency medi¬ 
cal personnel’s evaluation, determined the final discharge 
diagnosis according to the emergency department chart. Com¬ 
pared with the final diagnosis, the LAPSS had a sensitivity of 
91% (95% Cl, 76%-98%), specificity of 97% (95% Cl, 93%- 
99%), LR+ of 31 (95% Cl, 13-75), and LR- of 0.09 (95% Cl 
0.03-0.27) for patients with possible stroke (Table 48-4). An 
analysis that included all ambulance runs showed even better 
specificity (and therefore a much higher LR+), with only a 
slight decrement in sensitivity, attributed to 2 stroke patients 
who were not correctly identified in the field as having a possi¬ 
ble stroke, of 1092 total ambulance runs (0.19%). Among all 
patients with neurologically relevant signs, the prevalence of 
stroke was 10%, which represents a useful anchor for prior 
probability estimates (Figure 48-1). 

Reliability 

The data assessing the CPSS compare emergency medical per¬ 
sonnel with physicians for examinations performed in a con¬ 
trolled hospital setting rather than in the field. 18 The intraclass 
correlation coefficient (Pearson r) for the total score was 0.89 
(95% Cl, 0.87-0.92) among the prehospital care personnel and 
0.92 (95% Cl, 0.89-0.93) between the physician and the pre¬ 
hospital personnel. The greatest agreement was for arm drift 
(Pearson r = 0.91; 95% Cl, 0.89-0.93), followed by abnormal 
speech (Pearson r = 0.87; 95% Cl, 0.34-0.90) and facial palsy 
(Pearson r = 0.78; 95% Cl, 0.74-0.83). 

Scenario 

With either the CPSS or the LAPSS, the patient would have 
been identified as likely to have had a stroke, triggering rapid 
transport to the nearest appropriate emergency department 
for further evaluation and treatment. Physicians should feel 
confident with the medical history and brief screening exam¬ 
ination for stroke that is obtained by appropriately trained 
emergency first responders. 

In the case of this patient scenario, the patient arrives at 
the emergency department and his wife reports that her hus¬ 
band has hypertension. He has no history of diabetes, sei¬ 
zures, or recent head trauma. He is being treated with aspirin 
and a diuretic. He continues to have difficulty moving his 
right arm, along with trouble speaking. 

Is This Patient Having a Transient 
Ischemic Attack or Stroke? 

In the LAPSS study previously discussed, only 8% of 441 
patients transported to the hospital for nontraumatic, non- 


Table 48-3 Comparison of Physician Assessment With That of 
Emergency Medicine Personnel 3 ' 11 

No. of Findings 


Present 

Stroke 

Nonstroke Diagnosis 

LR (95% Cl) 

Physician Assessment 3 

3 

4 

1 

14(1.6-121) 

2 

6 

5 

4.2 (1.4-13) 

1 

15 

10 

5.2(2.6-11) 

0 

13 

117 

0.39(0.25-0.61) 

Emergency Medical Personnel 6 

3 

20 

10 

7.0 (3.3-14) 

2 

22 

10 

7.6 (3.7-16) 

1 

49 

39 

4.4 (3.0-6.4) 

0 

63 

476 

0.46 (0.38-0.56) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Based on data from Kothari et al. 18 

“Collapsing data into a 2 x 2 table yields an LR of >1 finding = 5.5; 95% Cl, 3.3-9.1. 
“Data represent unique patients and stratum-specific LR. 

“Data represent 4 examinations for each patient. 


Table 48-4 Performance of Emergency Medicine Technicians on 
Stroke Assessment in the Field 3 

Stroke 

Frequency/All 

Patients LR+ (95% Cl) LR- (95% Cl) 


Ambulance runs for 
patients with target 
symptoms for possible 
stroke 8 

34/206 

31 (13-75) 

0.09 (0.03-0.27) 

All patients transported 
by ambulance 

36/1298 

217(90-526) 

0.14(0.06-0.31) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“Based on data from Kidwell et al. 19 

“Neurologic signs were altered consciousness, focal neurologic signs, seizure, syn¬ 
cope, head pain, and a cluster category of weakness/dizziness/sick. 


comatose, neurologically relevant complaints had a final diag¬ 
nosis of acute symptomatic cerebrovascular disease. 19 A variety 
of conditions can mimic TIA or stroke. Seizures, 31,32 neoplasms, 33 
infection, 34 intracranial hemorrhage, 35 and hypoglycemia 36 and 
other metabolic abnormalities are among the conditions that 
can simulate a TIA and stroke. In another series, among 821 
consecutive patients initially diagnosed with stroke, 13% 
were finally determined to have other conditions. 37 The most 
frequent causes of misdiagnosis were unrecognized seizures, 
confusional states, syncope, toxins, neoplasms, and subdural 
hematomas. 

Transient Ischemic Attack 

Transient ischemic attack is traditionally defined as a focal neu¬ 
rologic deficit of ischemic origin of less than 24 hours’ dura¬ 
tion. 38 Because most TIAs last fewer than 4 hours, the diagnosis 
is usually based on medical history rather than findings on 
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examination. 39 However, many patients previously diagnosed 
with TIA actually had cerebral infarcts demonstrated on mag¬ 
netic resonance imaging (MRI). 40 Clinically silent infarcts (and 
potentially infarcts associated with a classically defined TIA) 


Table 48-5 Estimates of the Accuracy of Classification of Stroke Type 
Based Solely on Clinical Data 35 

Diagnosis 

References 

LR+ (95% Cl) 

LR- (95% Cl) 

Stroke vs not stroke 5 

21 

40 (29-55) 

0.14(0.10-0.20) 

TIA vs not TIA d 

22 

21 (10-42) 

0.09 (0.02-0.34) 

Hemorrhagic vs non¬ 
hemorrhagic stroke e 

22, 23, 43 

3.1 (2.1-4.6) 

0.61 (0.48-0.76) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative 
likelihood ratio; TIA, transient ischemic attack. 

“Based on data from Goldstein and Matchar. 1 

“Only studies for which sensitivity, specificity, and LRs could be calculated are repre¬ 
sented in the table. 

“Persistent neurologic deficit of acute onset during the previous week, without a his¬ 
tory of head trauma, according to medical history and examination alone. 

“Focal neurologic deficit with a duration of less than 24 hours, according to medical 
history and examination alone. 

“Estimates for hemorrhagic vs nonhemorrhagic stroke are summary estimates from 
random-effects measures. 


Table 48-6 Precision of Elements of the Neurologic Examination of 
Stroke Patients 

Finding 

k Score or Range 3 

References 5 

Medical History 

Seizure at onset 

0.39 

45 

Previous stroke 

0.31 

45 

Transient ischemic attack 

0.11 

45 

Vomiting at onset 

0.35 

45 

Headache 

0.36 

45 

Examination 

Level of consciousness 

0.38-1.00 

45-51 

Orientation 

0.19-1.00 

45-47, 51-53 

Gaze preference 

0.33-1.00 

45, 46, 48-50, 53 

Visual field defect 

0.16-0.81 

45, 46, 48, 50, 53, 54 

Facial paresis 

0.13-1.00 

45, 46, 50, 52, 53, 54 

Arm strength 

0.42-1.00 

45-54 

Leg strength 

0.40-0.84 

45-54 

Limb ataxia 

-0.16-0.69 

46, 48, 50, 51 

Sensation 

0.27-0.89 

45, 46, 48, 50, 51, 53 

Language 

0.54-0.84 

45-48, 51,53 

Dysarthria 

0.29-1.00 

45-48, 51 

Neglect 

0.58-0.89 

46, 48, 50, 51 

Pupillary response 

0.95 

48 

Plantar response 

0.67 

48 

Gait 

0.91 

49 


“The values of the k statistic may be interpreted similar to the interpretation of corre¬ 
lation coefficients (k = 0-0.20, slight; 0.21 -0.40, fair; 0.41 -0.60, moderate; 0.61 - 
0.80, substantial; 0.81 -1.00, almost perfect agreement 55 ). 

“Among the cited studies, individual items were measured by different observers 
with various experience. 


may contribute to vascular dementia. 41 Traditionally defined 
TIA is an important marker of short- and long-term vascular 
risk. Of 1707 patients from a large health care plan in the 
United States, evaluated in the emergency department and 
diagnosed with TIA, 5.3% had a stroke within 2 days, 
whereas 10.5% had a stroke within 90 days. 42 The diagnosis 
of a stroke or TIA indicates the need for urgent management. 

Accuracy of a Transient Ischemic Attack Diagnosis 

Among patients admitted to a stroke unit for evaluation of 
an acute neurologic deficit, a clinical diagnosis of TIA 
increased the odds of a final TIA diagnosis by about 20-fold 
(Table 48-5) (LR+, 21; 95% Cl, 10-42), whereas an alternate 
diagnosis greatly decreased the odds of a TIA (LR, 0.09; 
95% Cl, 0.02-0.34). The excellent performance of the clini¬ 
cal examination in this filtered population with a high 
probability of stroke probably does not extrapolate to the 
emergency setting in which patients with neurologically rel¬ 
evant complaints have a broader differential diagnosis. In 
another study, about one-third of patients initially diag¬ 
nosed with TIA were eventually given a different diagnosis, 
with TIA being definitely not established in an additional 
one-third. 44 

Reliability of a Transient Ischemic Attack Diagnosis 

Despite its clinical importance, the reliability of the diagnosis 
of TIA can be poor. Agreement among experienced physi¬ 
cians for a patient’s history of TIA is barely greater than 
chance (k = 0.11; see the footnote to Table 48-6 for a guide to 
interpreting K scores). 45 Some of the imprecision is due to 
differences in categorizing patients as having minor stroke or 
TIA, a distinction that has little influence on patient manage¬ 
ment. Even with a standardized protocol, disagreements fre¬ 
quently occur with regard to the features of the TIA. In one 
study, medical histories were obtained from 28 patients by 
pairs of neurologists. 56 Agreement in the number of TIAs was 
observed in about half of the cases. In two-thirds of the cases, 
there was agreement in the time of onset for the first TIA and 
the duration of the episode; there was agreement less than 
half the time in the frequency and type of symptoms. A new 
definition of TIA shortens the duration for qualifying epi¬ 
sodes: “a brief episode of neurological dysfunction caused by 
a focal disturbance of brain or retinal ischemia, with clinical 
symptoms typically lasting less than 1 hour, and without 
[radiographic] evidence of infarction.” 57 The accuracy of 
symptoms, signs, and the overall clinical impression using 
this new definition has not been studied. 

There is fair agreement when minor stroke and TIA are con¬ 
sidered together as a previous ischemic episode (k = 0.60). 45 
Other studies suggest that substantial diagnostic agreement 
can be achieved when a standardized protocol is used for 
the 2 diagnoses (k = 0.65 58 and 0.77 59 ). The Asymptomatic 
Carotid Atherosclerosis Study compared an algorithm for the 
diagnoses to both an on-site neurologist’s diagnosis and that 
of an external panel of reviewers with expertise in stroke (ie, 
the gold standard) . 20 The key symptoms were sudden change 
in speech, visual loss, diplopia, numbness or tingling, paraly¬ 
sis or weakness, and nonorthostatic dizziness. Comparing 
stroke or TIA vs no vascular event, there was 80% agreement 
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between the external panel and the algorithm (k = 0.60; 95% 
Cl, 0.52-0.68). 

Stroke 

The operational definition of stroke requires relevant, focal 
neurologic symptoms with no other potential etiologies. 
Guideline statements from several professional societies rec¬ 
ommend excluding systemic or other neurologic processes 
that might cause the patient’s acute deficit as part of the eval¬ 
uation of the appropriateness of administering acute throm¬ 
bolytic therapy. 60 ' 62 

Accuracy of a Stroke Diagnosis 

The results of studies on the accuracy of stroke diagnosis are 
given in Table 48-5. In one study, patients with the presence 
of 4 findings were considered to have had a stroke if their 
medical history included a persistent, focal neurologic deficit 
of acute onset during the previous week but no history of 
head trauma. 21 This study, done before modern neuroimag¬ 
ing, relied on autopsy or stroke unit evaluations to establish 
the diagnosis in 39% of 2034 patients and consensus agree¬ 
ment for the remaining patients. Using this rule, emergency 
department physicians correctly identified 152 of 176 con¬ 
secutive patients with stroke (sensitivity, 86%; 95% Cl, 81%- 
91%) and 1818 of 1858 patients without stroke (specificity, 
98%; 95% Cl, 97%-99%). Thus, the odds of having a diagno¬ 
sis of stroke increase dramatically when patients satisfy this 
classification rule are LR+ of 4 findings is 40 (95% Cl, 29- 
55); LR- is 0.14 (95% Cl, 0.10-0.20). Although this LR- is 
low, neuroimaging studies may still be required to help diag¬ 
nose conditions that mimic stroke. Differences in the accu¬ 
racy of the diagnosis of stroke according to either the interval 
between the onset of symptoms and time of presentation or 
the likelihood that the patients’ symptoms and signs could be 
assigned to a specific vascular territory were not addressed. 
Data concerning the accuracy of the diagnosis for patients 
evaluated soon after the beginning of symptoms were lacking 
in this study, but the accuracy is particularly relevant, given 
the advent of reperfusion therapies such as intravenous tPA 
that necessitate treatment within 3 hours of symptom 
onset. 60-62 

Reliability of a Stroke Diagnosis 

High-quality studies of the reliability of the diagnosis of 
stroke are lacking. 

Scenario 

The prior probability of a stroke among patients with neuro- 
logically relevant symptoms is 10%. According to the 
patient’s focal neurologic symptoms, the LAPSS study 19 sug¬ 
gests that the LR for stroke is 31. According to the clinical 
information obtained in the field and before the complete 
emergency department evaluation, the posterior probability 
for stroke is 78% (from posterior odds = [0.1/0.9] x 31 = 
3.4). The emergency physician’s confirmation that the 
patient had an abrupt onset of focal neurologic symptoms 
and no known conditions that would increase the chances of 
a stroke mimic increases the likelihood of a stroke. 

In the case of this patient scenario, the patient’s blood pres¬ 
sure reading is 150/95 mm Hg. His pulse rate is 84/min and 


regular. He is alert, knows his age and the current month, and 
is able to follow simple verbal commands (NIHSS item la, lb, 
lc; Table 48-2). He has dysarthric speech (NIHSS item 10) and 
had difficulty naming common objects (ie, dysnomia; NIHSS 
item 9), but his speech is understandable. At rest, the patient 
tends to look only to the left, but on command he is able to 
look to the right (ie, a left gaze preference; NIHSS item 2). On 
asking him to identify your fingers at the periphery of his 
visual fields, you discover that he sees nothing to his right (a 
right homonymous visual field defect; NIHSS item 3). The 
right side of his face droops (a right lower facial paresis; NIHSS 
item 4), and when he holds his arms straight out with the 
palms facing up, his right arm drifts downward (a right-sided 
drift; NIHSS item 5). His right leg is slightly weak to motor 
testing but does drift by a count of 5 (NIHSS item 6). He has 
no limb ataxia (the smoothness of movements is consistent 
with the amount of limb weakness; NIHSS item 7) but has 
diminished pain sensibility in his right arm (pinprick is 
described as feeling dull in his right arm compared with his 
left; NIHSS item 8). There is no evidence of spatial neglect (he 
is able to recognize being touched on his right arm and leg 
when touched on the right and left sides simultaneously; 
NIHSS item 11). A glucose level obtained by fingerstick was 
110 mg/dL (6.1 mmol/L). 

What Is the Vascular Distribution of the Stroke? 

Accuracy of Determining the Stroke Distribution 

Historical and objective data help localize the affected por¬ 
tions of the nervous system, providing clues about the likely 
pathophysiology and etiology (essential for rational second¬ 
ary prevention). 61 Clinicians must recognize that computed 
tomographic (CT) scan results are frequently negative during 
the first hours after ischemic stroke and technical limitations 
often impair CT imaging of posterior fossa structures. These 
limitations in early neuroimaging of the evolving stroke serve 
to emphasize the importance of the clinical examination. 
MRI scans, with greater sensitivity than CT, are often not 
available for immediate, routine patient evaluations. 63 

Reliability of Determining the Stroke Distribution 

The clinical examination is most important despite its less 
than perfect accuracy early in the course of a stroke episode, 
when the initial imaging studies may not reveal the abnor¬ 
mality. An understanding of the reliability of the examina¬ 
tion helps identify the clinical features that have potential 
utility. Clinical experience suggests that the reliability of indi¬ 
vidual elements of the neurologic history and examination is 
important for the description of the stroke patient’s neuro¬ 
logic deficits (Table 48-6). 45 ' 52,54,64 Obtaining historical data 
from stroke patients can be hampered because of the com¬ 
munication deficits caused by the stroke. Only 1 of these 
studies assessed the reliability of historical data. 45 

The reliability of historical items is generally low, ranging 
from slight to fair agreement between observers, 45 which is 
particularly noteworthy because so much of diagnosis, par¬ 
ticularly of transient events such as TIAs, depends on the 
patient’s medical history. The reliability of specific neuro¬ 
logic examination findings improves when the examination 
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is performed with knowledge of the patient’s medical his¬ 
tory and when a full examination is performed in contrast 
to an examination aimed at a particular finding. 65 Several 
specific findings are assessed, with high degrees of reliabil¬ 
ity (Table 48-6). However, in practice, anatomic diagnosis 
for neurologic conditions requires recognition of the pat¬ 
tern of abnormal and normal findings, rather than a single 
finding. 

Experienced physicians consider their own views of the 
reliability of given findings (ie, subjective sensory abnormali¬ 
ties tend to be unreliable) when arriving at a specific ana¬ 
tomic diagnosis. Although neuroanatomic diagnosis can be 
complex, schemes have been developed that can be generally 
applied. For example, the Oxfordshire classification (used 
primarily in research settings) assigns one of the 4 anatomic 
distributions (Box 48-1). 66 When caused by ischemia, the 
total anterior circulation infarction syndrome (TACS) 
reflects proximal occlusion of the internal carotid artery or 
trunk of the middle cerebral artery; the partial anterior circu¬ 
lation infarction syndrome suggests a branch artery occlu¬ 
sion in the middle cerebral artery distribution; a lacunar 
infarction syndrome indicates occlusion of a small penetrat¬ 
ing vessel; and posterior circulation infarction syndrome is 
consistent with vertebrobasilar distribution stroke. The reli¬ 
ability of this classification is moderate to good (k = 0.54; 
95% Cl, 0.39-0.68). 67 


Box 48-1 Oxfordshire Classification of Subtypes 
of Cerebral Infarction 3 

TOTAL ANTERIOR CIRCULATION 
INFARCTION SYNDROME (TACS) 

A combination of new higher cerebral dysfunction (ie, 
dysphasia, dyscalculia, visuospatial disorder); homony¬ 
mous visual field defect; and ipsilateral motor or sensory 
deficit of at least 2 areas of the face, arm, and leg. 

PARTIAL ANTERIOR CIRCULATION 
INFARCTION SYNDROME (PACS) 

Only 2 of the 3 components of the TACS syndrome are 
present, with higher cerebral dysfunction alone or with a 
motor/sensory deficit more restricted than those classified 
as LACS (ie, confined to 1 limb or to face and hand, but 
not to the whole arm). 

LACUNAR INFARCTION SYNDROME (LACS) 

Pure motor stroke, pure sensory stroke, sensorimotor 
stroke, or ataxic hemiparesis. 

POSTERIOR CIRCULATION 
INFARCTION SYNDROME (POCS) 

Any of the following: ipsilateral cranial nerve palsy with 
contralateral motor or sensory deficit; bilateral motor or 
sensory deficit; disorder of conjugate eye movement; cere¬ 
bellar dysfunction without ipsilateral long-tract deficit (ie, 
ataxic hemiparesis); or isolated homonymous visual field 
defect. 

“Based on data from Bamford et al. 66 


Scenario 

This right-handed patient with unilateral right facial and 
limb weakness might have a lesion affecting contralateral 
central motor pathways at any level of the neuraxis above the 
midpons. However, when these findings are combined with 
an aphasia (manifest as a dysnomia), the patient’s deficit is 
likely the result of a lesion affecting the dominant hemi¬ 
sphere. The greater involvement of face and arm as compared 
with his leg suggests an abnormality extending from the 
region of the Sylvian fissure toward the convexity, consistent 
with ischemia in the distribution of the left middle cerebral 
artery. 

In this patient scenario, the examination result is consis¬ 
tent with a left middle cerebral artery distribution cerebral 
infarction, fulfilling criteria for TACS. However, a neuroim¬ 
aging study is necessary to help exclude a stroke mimic and 
to determine whether the patient may have had a brain hem¬ 
orrhage. You request a brain CT scan. 

Assigning Stroke Severity 

According to the information provided in Table 48-6, it is 
apparent that the reliability of specific items varies widely. 
During the course of care, and to guide prognosis, standard¬ 
ized assessments of a stroke patient’s deficits improve the 
reliability of the routine neurologic examination. Examples 
with supportive reliability and validity data include the 
Canadian Neurological Scale, 52 the Copenhagen Stroke 
Scale, 54 the Scandinavian Neurological Stroke Scale, 49 the 
Unified Neurological Stroke Scale, 68 and the NIHSS. 48 Of 
these, the NIHSS has been widely adopted for clinical care 
and research in the United States and other countries (Table 
48-2). The scale and instructions are available as an online 
resource (http://www.ninds.nih.gov/doctors/stroke_scale_ 
training.htm; accessed June 13, 2008). The reliability of the 
scale’s individual items has been studied extensively (Table 
48-7); data from some of these studies are included in the 
ranges given in Table 48-6. With the highest values within 
each range, most items can have substantial to almost perfect 
levels of agreement. With the lowest values, reliability can be 
as low as slight to moderate. 

Recognition of the potential for limited reliability of some 
items has led to the development of a free online training and 
certification program sponsored by the American Stroke 
Association, in conjunction with the American Academy of 
Neurology and the National Institute of Neurological Disor¬ 
ders and Stroke (http://nihss-english.trainingcampus.net/uas/ 
modules/trees/windex.aspx; accessed June 13, 2008). With 
training, the NIHSS can be used reliably by nonneurologist 
physicians, as well as nurses. 50,69 The NIHSS can also be 
scored with high reliability by remote observers via telemedi¬ 
cine (correlation between bedside and remote scores, r = 
0.955; P < .001). 70 

The NIHSS scores correlate well with the size of the stroke 
as measured by MRI. 71 Therapeutically, a secondary analysis 
of the National Institutes of Health (NIH) tPA trial data 
found that the risk of intracerebral hemorrhage was inde¬ 
pendently associated with baseline stroke severity as assessed 
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with the NIHSS, divided into 5 categories (0-5, 6-10, 11-15, 
16-20, and >20; odds ratio [OR], 1.8; 95% Cl, 1.2-2.9). 12 
After tPA treatment, 17% of patients with a baseline NIHSS 
score greater than 20 developed an intracerebral hemorrhage 
vs 3% to 5% with less severe strokes. Overall, those in the 
most severe category had the overall worst prognosis for 
recovery by 3 months, yet they were also the most likely to 
improve with tPA (OR, 4.3; 95% Cl, 1.6-12). This informa¬ 
tion, derived from clinical observations, is helpful when dis¬ 
cussing the risks and benefits of the treatment with patients 
and families. 

Scenario 

In this case, the example patient had an NIHSS score of 9 
(item 2 = 1, item 3 = 2, item 4 = 2, item 5=1, item 8 = 1, 
item 9=1, item 10 = 1; Table 48-2). 

He has a 22% risk of death or a poor outcome without 
reperfusion therapy. You need to determine whether he had a 
hemorrhagic or ischemic stroke to assess the appropriateness 
of thrombolysis. 

Classifying the Stroke 

Accuracy of Stroke Classification 

It is not enough to determine whether the patient with an 
acute focal neurologic deficit has had a stroke. Treatment 
with a thrombolytic or an antithrombotic drug is contraindi¬ 
cated in patients with hemorrhage. Three studies that pro¬ 
vide information about the accuracy of medical history and 
physical examination in distinguishing hemorrhagic from 
ischemic strokes indicate that clinical judgment can be used 
to increase or decrease the likelihood of hemorrhage, but 
diagnostic errors occur (Table 48-5). In one study, a multi¬ 
variate model showed that initial depressed level of con¬ 
sciousness, vomiting, severe headache, warfarin therapy, 
systolic blood pressure above 220 mm Hg, and glucose level 
above 170 mg/dL (9.4 mmol/L) in a patient without diabetes 
increased the likelihood of hemorrhagic stroke. 23 The pres¬ 
ence of any of these features more than doubles the odds of 
hemorrhage (LR+, 2.4; 95% Cl, 1.8-3.2) and the absence of 
any of these features decreases the odds by one-third (LR-, 
0.35; 95% Cl, 0.18-0.68). The other 2 studies described the 
accuracy of the physician’s overall assessment without the use 
of a predictive model and produced results that performed 
similarly to those of the multivariate model (the results were 
statistically homogenous for the diagnostic OR; P = .99). 
Thus, the clinical judgment that a stroke is hemorrhagic has 
an LR = 3.1 (95% Cl, 2.1-4.6), whereas the assessment that 
the stroke is not hemorrhagic decreases the likelihood (LR, 
0.61; 95% Cl, 0.48-0.76). The use of a complex discriminant 
score (based on specific historical and objective physical fac¬ 
tors) modestly improves accuracy relative to clinician judg¬ 
ment but is cumbersome and not clinically useful. 72 A 
neuroimaging study is mandatory before the patient is given 
a thrombolytic agent or anticoagulant. 60 ' 62 

Reliability of Stroke Classification 

Examining neurologists show only slight agreement on clas¬ 
sifying a stroke as due to an infarct vs a hemorrhagic stroke 
(K = 0.38). 73 


Table 48-7 Reliability of National Institutes of Health 

Stroke Scale Items 3 

Item 

k Range 

la. Level of consciousness 

0.46 to 0.68 

1 b. L0C questions 

0.44 to 0.94 

1c. L0C commands 

0.41 to 0.94 

2. Gaze 

0.33 to 0.82 

3. Visual fields 

0.57 to 0.90 

4. Facial palsy 

0.22 to 0.74 

5. Arm strength 

0.77 to 0.97 

6. Leg strength 

0.39 to 0.98 

7. Limb ataxia 

-0.16 to 0.69 

8. Sensation 

0.39 to 0.89 

9.Language 

0.60 to 0.84 

10. Dysarthria 

0.29 to 0.72 

11. Extinction/neglect 

0.53 to 0.89 


Abbreviation: L0C, level of consciousness. 
3 Based on published data. 35 ' 48 ' 50 ' 51 


Scenario 

The patient was alert, not nauseated, did not have a head¬ 
ache, and was not receiving warfarin. His blood pressure was 
not severely increased, and his blood glucose level was nor¬ 
mal. The chance of an intracerebral hemorrhage is low but 
cannot be excluded without a neuroimaging study. 

In this case, recognizing that the neuroimaging results may 
be inconclusive, you must consider whether there might be 
some other cause for his stroke and his likely stroke subtype 
diagnosis. 

Ischemic Stroke Subtype Diagnosis 

Accuracy of Ischemic Stroke Subtype 

Ischemic stroke may be caused by a variety of pathophysio¬ 
logic conditions and mechanisms. The distinction between 
ischemic stroke subtypes is important to guide specific sec¬ 
ondary prevention measures such as treatment with antico¬ 
agulants that are useful in patients with cardiogenic 
embolism. In contrast, anticoagulants are not useful for 
patients with atherothrombotic stroke. 74,75 Patients with 
carotid artery distribution symptoms who have an ipsilateral 
high-grade extracranial carotid artery stenosis benefit from 
carotid endarterectomy. 76 Simple clinical features useful at 
the bedside can help. For example, the acute onset of a focal 
neurologic deficit in a patient with a cardiac or arterial 
embolic source increases the odds of embolic stroke up to 
nearly 11-fold (LR+, 11; 95% Cl, 5.7-21), whereas the 
absence of these features decreases the odds of embolic stroke 
by approximately one-quarter to one-half (LR-, 0.36; 95% 
Cl, 0.24-0.56). 77 

Reliability 

Only a few studies have considered the reliability of classifica¬ 
tion of stroke type based solely on clinical findings. The avail¬ 
able data indicate that a physician’s assessment of ischemic 
stroke subtype according to medical history and physical 
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examinations alone is not reliable. For example, the Stroke 
Data Bank Investigators found that agreement on classification 
of stroke subtypes (cardiogenic embolism, large artery athero¬ 
sclerosis, tandem arterial pathology, lacunar stroke, infarct of 
unknown cause, parenchymatous hemorrhage, and subarach¬ 
noid hemorrhage) was poor (k = 0.15). 73 The combined poor 
accuracy and reliability means that radiographic and other 
tests are required to help identify the ischemic stroke subtype. 
The combination of the clinical findings and the neuroimag¬ 
ing results serves as the reference standard for determining the 
presence of an ischemic stroke. 77 - 78 

Scenario 

After the clinical examination, accurate ischemic stroke subtype 
diagnosis typically requires neuroimaging and other studies (ie, 
echocardiography to identify a source for possible emboli). 

In this case, the brain CT scan result was interpreted as 
being normal. He was observed to have paroxysmal atrial 
fibrillation on a heart monitor during his CT examination. 
After careful review of the inclusion/exclusion criteria for 


intravenous tPA, he was treated beginning 2 hours after the 
onset of his symptoms for a presumed ischemic stroke. Non- 
invasive studies later showed no evidence of extracranial 
carotid artery stenosis. 

Prognosis 

Patients with any combination of impaired consciousness, 
hemiplegia, and conjugate gaze palsy have a relatively higher 
mortality rate during the first 3 weeks after their stroke. Data 
from the prethrombolytic era showed that the presence of 
any of these findings had an LR of 1.8 for death (95% Cl, 1.2- 
2.8), whereas the absence of all 3 had an LR of 0.36 (95% Cl, 
0.13-1.0). 24 Thirty-seven percent of those whose conscious¬ 
ness was initially impaired died, compared with no deaths 
among patients initially alert. 24 Several multivariable scoring 
systems have been developed to aggregate those findings 
believed by clinicians to reflect stroke severity and predict 
mortality (Table 48-8). These scores are calculated by adding 
points for abnormal clinical findings. 


Table 48-8 Prognosis Aft 

er Stroke According to Clinical Data 3 

Reference 




24 

25“ 

26 

27“ 

28“ 

29“ 

79“ 

Score Components 

Orientation 


+ 

+ 

+ 

+ 

+ 

+ 

Level of consciousness 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

Neglect 



+ 

+ 

+ 



Language 


+ 


+ 


+ 

+ 

Gaze preference 

+ 

+ 




+ 

+ 

Visual field defect 



+ 

+ 



+ 

Facial paresis 


+ 

+ 



+ 

+ 

Dysarthria 



+ 





Arm strength 

+ 

+ 

+ 


+ 

+ 

+ 

Leg strength 

+ 

+ 

+ 


+ 

+ 

+ 

Ambulation 




+ 




Plantar response 







+ 

Sensation 


+ 

+ 

+ 


+ 

+ 

General function 


+ 


+ 


+ 


Accuracy 

Mortality 

Sensitivity (95% Cl), % 

80 (55-93) 

72 (63-79) 

b 

60 (48-72) 

85(76-91) 

42(26-61) 

86 (77-92) 

Specificity (95% Cl), % 

56(41-70) 

99(97-100) 


94 (87-97) 

80 (72-86) 

95 (90-98) 

60 (51-68) 

LR+ (95% Cl) 

1.8(1.2-2.8) 

77 (19-305) 


9.4 (4.5-20) 

4.3 (2.9-6.2) 

9.0 (3.7-22) 

2.1 (1.7-2.7) 

LR- (95% Cl) 

0.36(0.13-1.0) 

0.29 (0.22-0.38) 


0.42(0.31-0.58) 

0.19(0.12-0.31) 

0.61 (0.44-0.84) 

0.23(0.13-0.42) 

Disability 1 

Sensitivity (95% Cl), % 



91 (83-95) 

78 (64-88) 

73 (60-83) 

14(7-26) 


Specificity (95% Cl), % 



86 (73-93) 

86 (74-93) 

77 (60-88) 

97(91-99) 


LR+ (95% Cl) 



6.4(3.2-13) 

5.5(2.8-11) 

3.1 (1.7-5.8) 

4.5(1.2-16) 


LR- (95% Cl) 



0.11 (0.05-0.21) 

0.25(0.15-0.44) 

0.36 (0.22-0.56) 

0.89 (0.79-0.99) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 

“Based on data from Goldstein and Matchar. 1 
“Ellipses indicate study did not provide relevant data. 

“Predictions concerning disability refer to the chance of returning to independence in activities of daily living according to dichotomization into less vs more severe from impair¬ 
ment level scores, including the indicated items as reflected in each total score’s definitions provided in the source articles. 
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Predicting functional outcome among stroke survivors is 
more complicated than predicting survival. 80 The results of 
functional outcome assessments vary, depending on when 
the assessments are performed and how outcome is meas¬ 
ured. As with mortality, multivariate discriminant scores 
have also been used to predict dependency in activities of 
daily living (Table 48-8). 24 ' 29,79 The NIHSS score not only pro¬ 
vides a numeric summary of a patient’s neurologic impair¬ 
ments that allows monitoring for changes in the extent of 
deficits but also helps determine prognosis and the use of 
specific therapies. One study found that each additional 
point on the NIHSS, within 24 hours of stroke onset, was 
associated with a decrease in the likelihood of an excellent 
outcome at 7 days by 24% (OR, 0.76; 95% Cl, 0.72-0.80) and 
at 3 months by 17% (OR, 0.83; 95% Cl, 0.81-0.86). 79 As 
described above, the NIHSS predicts a patient’s prognosis. 
Less than 20% of untreated patients with an NIHSS score of 
more than 15 at baseline recover to the point of having little 
or no disability. 79 Approximate point estimates predicting 
outcome at 3 months are based on NIHSS scores obtained 
within the first 24 hours of ischemic stroke (Table 48-9). 81 - 82 

Scenario 

The example patient was alert, was not hemiplegic, and did 
not have a conjugate gaze palsy. He has a low likelihood of in- 
hospital mortality related to the stroke. According to his 
NIHSS score of 9, he has an approximately 78% chance of 
having a good or excellent recovery by 3 months without 
treatment (Table 48-9). Twenty-four hours after he received 
tPA, a brain CT scan showed no evidence of hemorrhage, 
and he was administered warfarin for secondary stroke pro¬ 
phylaxis for an atrial fibrillation-related cardioembolic 
stroke. He was able to ambulate independently by the time of 
hospital discharge, and his speech disturbance improved 
(NIHSS score of 4). He received outpatient physical, occupa¬ 
tional, and speech therapy and had an NIHSS score of 2 after 
3 months. 


THE BOTTOM LINE 

The medical history and neurologic examination are critical 
tools for the identification and treatment of patients with 
suspected cerebrovascular disease. This is especially true in 
patients being evaluated soon after the onset of symptoms, 
before neuroimaging results are available, and in patients 
with transient symptoms in whom no parenchymal abnor¬ 
mality on brain neuroimaging may develop. 

Among noncomatose patients without head trauma who 
have neurologically relevant symptoms for which stroke is a 
consideration, the prior probability of a TIA or stroke is 
approximately 10%. 

The likelihood of stroke increases with the following acute 
neurologic deficits: facial droop, arm drift, or a speech dis¬ 
turbance. Despite the increased odds of stroke in patients 
who satisfy this simple clinical rule (using the CPSS, LR+, 
5.5; 95% Cl, 3.3-9.1), appropriate neuroimaging and other 
tests are still required to exclude other potentially treatable 
etiologies and to better define the stroke subtype. 


Table 48-9 Prognosis at 3 Months According to the Baseline NIHSS for 
Patients With Ischemic Stroke 3 " 


NIHSS Score 



0-3 

4-6 

7-10 

11-15 

16-22 

>23 

Dead 

1 

2 

4 

9 

18 

34 

Poor* 

3 

10 

18 

35 

40 

48 

Good" 

15 

25 

32 

34 

25 

12 

Excellent" 

80 

63 

46 

22 

17 

6 


Abbreviation: NIHSS, National Institutes of Health Stroke Scale. 

“Based on data from Adams et al. 79 Data are presented as percentages. 

"Outcome was determined according to the Glasgow Outcome Scale 81 and Barthel 
Index 82 : poor, Glasgow Outcome Scale score <2 and Barthel Index score <60; good, 
Glasgow Outcome Scale score <2 or Barthel Index score <60; excellent, Glasgow 
Outcome Scale score >2 and Barthel Index score >60. 

Reliability is lowest for historical items and subjective find¬ 
ings (ie, the sensory examination). Reliability is higher for 
objective findings such as motor impairment. The astute cli¬ 
nician is aware of these differences when weighing the rela¬ 
tive diagnostic implications. 

The NIHSS is widely used for recording the clinical find¬ 
ings because it improves reliability and provides informa¬ 
tion helpful for determining a patient’s prognosis and 
management. Reliability improves with experience, and 
Web-based resources are available for training and certifica¬ 
tion (http://nihss-english.trainingcampus.net/uas/modules/ 
trees/windex.aspx; accessed March 12, 2008). 

Clinical findings may be suggestive of stroke type, but reli¬ 
ability is poor when the diagnosis is based solely on medical 
history and physical examination. Neuroimaging is required 
to exclude hemorrhage and other tests are necessary to help 
identify the ischemic stroke subtype. Ischemic stroke subtype 
is often never established with certainty during the process of 
care, so acute therapeutic decisions must sometimes be made 
with the knowledge that the ischemic stroke subtype diagno¬ 
sis may be unreliable. 

The severity of a patient’s initial neurologic impairments 
provides a useful guide for prognosis. 
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the Diagnosis” section summarizes the findings published in 
the original review. The “Clinical Scenario” and “Resolution” 
are published in the article shown in this section. 


STROKE— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Among emergency patients with nontraumatic, noncoma- 
tose, neurologically relevant complaints, the prevalence of 
stroke or transient ischemic attack is roughly 10%. 

POPULATION FOR WHOM STROKE 
SHOULD BE CONSIDERED 

Stroke can be considered in patients with a variety of symp¬ 
toms and signs. Patient with acute neurologic findings, espe¬ 
cially those associated with acute focal sensory deficits, focal 
weakness, change in mentation or level of consciousness, or 
sudden loss of ability to communicate effectively, should be 
evaluated for a stroke. Headache, seizure, and syncope are 
also important symptoms that can identify a patient with a 
stroke. 

DETECTING THE LIKELIHOOD OF STROKE 

Typically, the physician can rely on just a few findings for 
identifying the patient with a stroke (Table 48-10). 


Table 48-10 Likelihood Ratios for Stroke From Summing Up 
Combinations of Findings 

Combination of Findings 3 

Findings 

Present 

LR+ (95% Cl) 

Cincinnati Prehospital Stroke Scale 1 

Facial paresis 

3 Present 

14(1.6-121) 

Arm drift 

2 Present 

4.2(1.4-13) 

Abnormal speech 

1 Present 

5.2(2.6-11) 


0 Present 

0.39(0.25-0.61) 

Hospital Evaluation 2 

Persistent neurologic deficit 

4 Present 

40 (29-55) 

Focal neurologic deficit 

1-3 Present 

Uncertain LR, but proba¬ 
bility of stroke >10% 

Acute onset of symptoms 
during the previous week 

0 Present 

0.14(0.10-0.20) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio; LR+, positive likelihood 
ratio. 

a The findings should be applied to patients who have no head trauma. 


REFERENCE STANDARD TESTS 

Combination of clinical findings with neuroimaging results. 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have 

Temporal Arteritis? 

Gerald W. Smetana, MD 
Robert H. Shmerling, MD 


CASE 1 A 74-year-old woman has recent onset of daily 
bitemporal headache but is otherwise well. Her general 
physical examination results are normal and the erythro¬ 
cyte sedimentation rate (ESR) is moderately elevated, at 
64 mm/h. You wonder whether additional medical history 
or physical examination findings will modify your suspi¬ 
cion of possible temporal arteritis (TA) or whether the 
historical features alone warrant proceeding to temporal 
artery biopsy. 

CASE 2 A 53-year-old man has a 1-month history of 
fever and fatigue and reports a single episode of transient 
partial loss of vision in one eye. You believe that TA is 
among the diagnostic considerations but suspect that he is 
too young for this diagnosis. You wonder if additional 
medical history, physical examination, or laboratory test¬ 
ing will change the probability of TA sufficiently to alter 
your decision about the role of temporal artery biopsy, 
rather than pursuing diagnostic evaluation for carotid 
artery stenosis or other considerations first. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


When faced with a patient with headache, fatigue, or other 
possible presenting symptom of TA, clinicians must be able to 
correctly and confidently establish the diagnosis to prevent 
irreversible vision loss and to minimize the inappropriate eval¬ 
uation and treatment of alternative diagnoses. Although head¬ 
ache is the most common reason for clinical suspicion of TA, 
no single type of headache or other clinical presentation is spe¬ 
cific for TA, and the disorder is among the diagnostic consid¬ 
erations for many symptom complexes in older individuals. 
Our review will analyze the diagnostic value of these varied 
symptoms and signs in predicting the likelihood of TA among 
patients for whom there is a clinical suspicion of disease. 

The first known report of a patient with TA was by Hutch¬ 
inson in 1890. 1 His case was that of man who was referred 
because of “red streaks on his head” that were painful and 
prevented him from wearing his hat; these proved to be swol¬ 
len temporal arteries, which over time became firm and 
pulseless. It was not until 1932 that Horton et al 2 described 
the first 2 cases of pathologically confirmed TA; both patients 
had fever, weakness, anorexia, weight loss, anemia, leukocy¬ 
tosis, and painful tender temporal arteries. Thus, many of the 
characteristic features of this newly described disease were 
present in these first few patients. Headache was absent. In 
1937, headache was recognized as a common feature, 3 and in 
1938 vision loss was first reported. 4 In the modern era, how¬ 
ever, clinicians are unlikely to treat patients with such 
advanced disease and the full array of untreated symptoms. 
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The mortality of patients with treated TA, during follow¬ 
up periods as long as 12 years, is the same as for age-matched 
individuals without TA. For example, Matteson et al 5 studied 
205 of the patients with TA who formed the initial cohort for 
the development of the American College of Rheumatology 
classification criteria. During a mean of 7 years of follow-up, 
the survival for patients with giant-cell arteritis was nearly 
identical to that of age-matched controls; the standardized 
mortality ratio was 1.03. Other authors have also observed 
that no excess mortality exists among patients with TA dur¬ 
ing periods ranging from 4.5 to 12 years. 6-8 Unrecognized 
(and therefore untreated) patients may have a higher mortal¬ 
ity, but no such natural history studies of untreated patients 
exist in the modern era. 

Although preventing death may not be among the benefits 
of early diagnosis of TA, timely diagnosis and treatment will 
prevent vision loss. A prompt decision regarding further 
evaluation (including referral for temporal artery biopsy) 
and early initiation of treatment are the primary rationales 
for improving the clinical prediction of the diagnosis. In 
addition, clinicians may avoid an extensive evaluation for 
other causes of symptoms by establishing a proper diagnosis. 
Because systemic corticosteroids have been the standard 
therapy for TA for decades, few studies have determined the 
long-term incidence of vision loss among untreated patients. 
Several studies, however, have demonstrated a substantial 
reduction in the incidence of vision loss after institution of 
corticosteroid therapy. Even among patients with complete 
unilateral vision loss, prompt recognition and corticosteroid 
therapy will decrease the risk of vision loss in the contralat¬ 
eral eye. 

Aiello et al 9 reviewed the Mayo Clinic experience of 245 
patients diagnosed with TA who had a complete ophthalmo¬ 
logic examination at diagnosis or early in the course of treat¬ 
ment. The estimated 5-year probability of developing vision 
loss after initiation of corticosteroid therapy was 1%; that of 
additional vision loss in patients who already had vision loss 
was 13%. These observations and others emphasize the 
importance of the early diagnosis and treatment of TA and of 
the clinical examination in identifying patients at risk for cat¬ 
astrophic vision outcomes. 1012 

Estimates of the prevalence of TA have been fairly con¬ 
stant. Using population data from Olmsted County, Minne¬ 
sota, Salvarani et al 13 estimated the age-adjusted incidence for 
individuals aged 50 years or older to be 24.2 per 100000 
women and 8.2 per 100000 men. In another report, preva¬ 
lence estimates increased by age and were 200 per 100000 
individuals aged 50 years and older, and 1100 per 100000 
individuals aged 85 years and older. 14 These findings are sim¬ 
ilar to those observed in a Swedish population study, in 
which the average annual incidence of TA among individuals 
older than 50 years was 22.2 per 100000 and the incidence 
increased with age. 15 In this study of 665 patients with TA 
proven by biopsy, only 1 patient was younger than 50 years. 
Other investigators have reported similar incidences. 16,17 That 
TA is predominantly a disease of older individuals has 
importance because of the aging of our society. In the US 
2000 census (http://factfinder.census.gov/servlet/ACSSAFF 


Facts?_submenuId=factsheet_18c_sse=on; accessed June 15, 
2008), 35 million individuals (12.4% of the population) were 
aged 65 years or older and 9 million (3.3% of the population) 
were aged 80 years or older; these proportions are expected 
to increase. 

The relatively low prevalence of TA does not diminish its 
importance to clinicians because of the morbidity resulting 
from overlooking this disorder. In fact, the higher prevalence 
of TA (1.5%) in one large autopsy series suggests that the dis¬ 
order may be either unrecognized or clinically occult in 
many cases. 18 The vision prognosis of occult TA is, of course, 
unknown, and series that describe the frequency of signs and 
symptoms include only patients with clinically evident TA. 

Pathophysiology 

The clinical manifestations of TA are a direct consequence of 
local (or “arteritic”) and systemic inflammatory disease. 
Localized arterial inflammation, particularly in the smaller 
branches of the external carotid artery, cause endovascular 
damage, vessel stenosis, and occlusion, ultimately leading to 
tissue ischemia or necrosis. Examples of localized arteritic 
symptoms include jaw claudication, caused by involvement 
of the masticatory muscles, and vision loss caused by involve¬ 
ment of the ophthalmic or posterior ciliary arteries. The par¬ 
ticular cytokine profile may contribute to the ischemic and 
prominent constitutional features, such as malaise, fever, or 
weight loss. 19,20 

How to Elicit the Signs and Symptoms 

The myriad signs and symptoms in patients with TA require 
familiarity with the most common ones, recognizing that 
many patients will demonstrate few symptoms and have a 
normal physical examination result. Headache, jaw claudica¬ 
tion, vision complaints, polymyalgia rheumatica (PMR), and 
constitutional features in a patient older than 55 years are 
among the most common symptoms. A high index of suspi¬ 
cion will lead the clinician to pursue these features because 
they may not be part of routine history-taking. Headache 
quality (typically severe and throbbing; less often sharp, dull, 
or burning), location (may be diffuse or localized but is 
bitemporal in half of cases), and onset (typically acute) are 
key features to assess; however, the headache of TA is often 
nonspecific in character. 21 Headache may actually be due to 
scalp tenderness, reported by the patient as pain when comb¬ 
ing the hair or putting on a hat. The headache is a new head¬ 
ache that is either recent in onset or different from previous 
headaches among patients with a history of chronic head¬ 
aches. The duration of the headache before seeking medical 
attention is commonly 2 to 3 months. Jaw claudication refers 
to pain in the proximal jaw near the temporomandibular 
joint that develops only after a brief period of chewing, espe¬ 
cially food requiring vigorous mastication, such as steak or a 
bagel. 

Clinicians must distinguish jaw claudication from other 
causes of jaw pain in elderly persons, such as disorders of the 
temporomandibular joint (in which pain begins right away 
with chewing) or ill-fitting dentures. Vision complaints 
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commonly include sudden monocular blindness, but clini¬ 
cians should ask patients about a stuttering onset of vision 
loss, amaurosis fugax, a field cut, or diplopia. As an inflam¬ 
matory polyarthritis with tendon or bursal involvement, 
PMR typically causes abrupt onset of morning stiffness 
involving the neck, shoulders, and hips, with referred pain to 
the proximal arms and thighs; this explains the prominent 
myalgias. 22 Although neoplasm and infection may be highly 
suspected in the older patient with fever, anorexia, weight 
loss, and malaise, systemic inflammatory disease such as TA 
may also cause these symptoms. 

The physical examination result is frequently unremark¬ 
able in patients with TA, but the detection of certain abnor¬ 
malities may increase the suspicion of disease. The patient’s 
temperature and general appearance are important first 
steps. Abnormalities of the temporal arteries, including ten¬ 
derness, reduced or absent pulsation, erythema, nodularity, 
or swelling, may be detected by light palpation just anterior 
and slightly superior to the tragus of the ear; following the 
pulse anteriorly along the temples and comparison with the 
contralateral side helps detect findings that may be remark¬ 
ably focal. Scalp tenderness, usually near the temporal arter¬ 
ies, may also be evident by light palpation. The scalp and 
tongue should be inspected for ischemic or necrotic skin 
changes. The funduscopic examination, ideally with pupil¬ 
lary dilation, may reveal a pale or swollen disc (evidence of 
ischemic optic neuropathy) 23 or retinal artery occlusion, 
whereas vision field testing may demonstrate a field cut. Joint 
examination may reveal reduced range of motion in the 
shoulder or hip because of pain or more distal synovitis, par¬ 
ticularly of the wrist. 

METHODS 

Search Strategy and Quality Review 

We performed a MEDLINE search of English-language arti¬ 
cles published between January 1966 and July 2000. Search 
terms included “temporal arteritis,” “giant cell arteritis,” 
“clinical features,” “diagnosis,” “diagnostic tests,” “sensitivity 
and specificity,” “medical history taking,” “physical examina¬ 
tion,” “signs and symptoms,” and “erythrocyte sedimentation 
rate.” We identified additional references by the use of a pre¬ 
viously published search strategy in The Rational Clinical 
Examination series. 24 This strategy combined 10 exploded 
Medical Subject Headings (“physical examination,” “medical 
history taking,” “professional competence,” “sensitivity and 
specificity,” “reproducibility of results,” “observer variation,” 
“diagnostic tests, routine,” “decision support techniques,” 
“Bayes theorem,” “mass screening”) and 2 text-word catego¬ 
ries (“sensitivity and specificity” and “physical examina¬ 
tion”), and intersected with “temporal arteritis.” We identified 
additional articles, including those predating MEDLINE, 
through a hand search of the bibliographies of retrieved 
articles, previous reviews, monographs, and textbooks. 
Both authors independently reviewed all retrieved articles 
to determine their eligibility for our review and included 
only those articles in which agreement existed that the 


study had met our inclusion criteria. We sought no unpub¬ 
lished studies. 

The purpose of our review is to determine the value of 
individual clinical features in predicting the likelihood of 
positive results from temporal artery biopsy. Eligible studies 
were, therefore, those in which the authors provided a 
detailed list of clinical features for patients suspected of hav¬ 
ing or confirmed to have TA. We excluded articles with lim¬ 
ited data on clinical features and those with fewer than 7 
patients with positive temporal artery biopsy results. Many 
early studies classified patients as having TA according to 
either the authors’ own clinical criteria alone or the presence 
of positive biopsy results. When a study considered both 
groups of patients as having TA, we required that at least 90% 
of included patients had undergone temporal artery biopsy 
and had had a positive result. 

We classified each article by the pathologic criteria used to 
determine the presence of positive biopsy results and by the 
referral source for recruitment of patients. In some cases, 
authors published clinical data on the same or overlapping 
series of patients in more than 1 article. In these cases, if we 
could not determine with certainty that no overlap existed 
between the patients in these studies, we excluded all studies 
except for the report with the largest number of patients. Of 
114 studies retrieved using our search strategy, 41 were eligi¬ 
ble for our review. Twenty-one studies included patients with 
both positive and negative temporal artery biopsy results; 
these form the core of our review. 

We determined whether the authors required any prede¬ 
termined published clinical criteria for patient inclusion, 
such as the American College of Rheumatology criteria 25 
for the diagnosis of TA, or other criteria. When studies 
used such criteria to classify patients as having TA with 
positive biopsy results or TA with negative biopsy results, 
we considered a positive biopsy result to be the true refer¬ 
ence standard and considered only those patients with 
such results to have the disease. In our analysis, we 
included only those clinical features that were cited by at 
least 2 studies. 

We classified the quality of evidence in each study by 2 
methods. First, we developed our own criteria that focused 
on the diagnostic criteria (Table 49-1). This step was neces¬ 
sary to distinguish studies that used biopsy result as a refer¬ 
ence standard from those that used established clinical 
criteria. In addition, we graded the quality of each study with 
a classification scheme for levels of evidence adapted from 
that previously developed for The Rational Clinical Exami¬ 
nation series (see Table 1-7). 26 In this scheme of levels 1 
through 5, the highest levels of evidence we found were in 
level 3 studies. 

Statistical Methods 

Sensitivity was defined as the proportion of patients with 
TA who had the particular sign or symptom; specificity was 
the proportion of patients without TA who did not have 
the particular sign or symptom. We calculated likelihood 
ratios (LRs) when authors reported clinical findings of 
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Table 49-1 Temporal Arteritis Diagnostic Criteria Quality Score 

Score 

Diagnostic Quality 

1 

Patients require biopsy confirmation to be classified as having 
temporal arteritis. 

2 

Patients are classified according to the presence of predefined 
established clinical criteria for temporal arteritis and on biopsy 
results. 

3 

All patients meet predefined established clinical criteria for tem¬ 
poral arteritis. The authors consider patients with negative biopsy 
results to have temporal arteritis if they meet established clinical 
criteria for temporal arteritis. 

4 

A series of consecutive patients with temporal arteritis proven 
by biopsy. No controls or patients with negative biopsy 
results. 

5 

No use of established clinical criteria. Patients do not require 
biopsy confirmation to be classified as having temporal arteritis. 

6 

The investigators require the presence of a particular symptom 
(eg, visual problems) in all patients with temporal arteritis. 


patients suspected of having TA both with positive and 
negative temporal artery biopsy results. Summary mea¬ 
sures for these dichotomous data and for the data reported 
on a continuous scale (eg, hemoglobin) were obtained with 
a random effects measure that gives broad 95% confidence 
intervals (CIs). 27,28 Uncertainty in these measures is reflected 
in the broad 95% CIs around the estimates. 

RESULTS 

Precision and Accuracy 

Twenty-one studies that met our inclusion criteria included 
patients with both positive and negative temporal artery 
biopsy results and form the basis of our review (Table 49-2). 
These studies reported clinical findings on a total of 2680 
patients, 1050 of whom had positive temporal artery 
biopsy results. The overall prevalence (prior probability) of 
positive biopsy results among patients with a clinical suspi¬ 
cion of TA in these studies was 39%. All but 4 of the studies 
were retrospective chart reviews. Eleven of the studies were 
of the highest quality (study quality 1) according to our 
predetermined criteria, and 19 of the studies included all 
patients who had a temporal artery biopsy during the study 
period. 

Precision of the Medical History and 
Physical Examination for Temporal Arteritis 

No study that met our inclusion criteria evaluated the preci¬ 
sion (ie, interobserver or intraobserver variation) of the 
medical history and physical examination for the diagnosis 
of TA. Most of the studies cited in this review are retrospec¬ 
tive chart reviews and did not use standardized instruments 
for eliciting signs and symptoms across different observers. 
We therefore restrict our discussion to the accuracy of clini¬ 
cal findings. 


Accuracy of Symptoms for the 
Diagnosis of Temporal Arteritis 

Among the studies that included data on patients both with 
positive and negative temporal artery biopsy results, 14 
symptoms were cited by at least 2 studies (Table 49-3). A lim¬ 
itation of our approach is that authors reported some find¬ 
ings much more frequently than others. However, our review 
incorporates the full extent of the published experience and 
presumably these reports include all of the major clinical fea¬ 
tures. Only 2 symptoms had LRs of sufficient power to be 
useful to clinicians. Jaw claudication had the highest LR+ 
(4.2), which is consistent with the traditional clinical teach¬ 
ing that jaw claudication, although somewhat insensitive, is a 
relatively specific feature for TA. When we pooled the sensi¬ 
tivity data from all eligible studies, including those studies 
that reported only patients with positive temporal artery 
biopsy results, 20 - 25,53 ' 71 jaw claudication was present in only 
34% of patients with disease (Table 49-4). 

More surprising was the finding that diplopia was the next 
most predictive symptom, with an LR+ of 3.4. Although the 
presence of diplopia substantially increases the likelihood of 
disease, the absence of diplopia does not significantly mod¬ 
ify the probability of disease (LR-, 0.95) because of its low 
sensitivity (9% among all studies). We derived this value 
from 5 studies that evaluated this feature; previous reviews 
and textbooks have not emphasized the importance of 
diplopia. No other symptom had an LR+ exceeding 2. This 
includes features often thought to be useful to clinicians, 
such as fever, PMR, vision loss, and temporal headache. 
The LR- of all 14 symptoms was near 1. In other words, the 
absence of any particular symptom did not rule out TA or 
make the disorder substantially less likely. Patients with pos¬ 
itive temporal artery biopsy results had a mean duration of 
symptoms of 3.5 months before diagnosis; this was 1.5 months 
(95% Cl, 0.4-2.5 months) shorter than those with negative 
biopsy results, emphasizing the relatively acute onset of 
symptoms of biopsy-proven TA and that a longer duration 
of symptoms makes a positive temporal artery biopsy result 
less likely. 

Accuracy of the Physical Examination 
for the Diagnosis of Temporal Arteritis 

Findings on physical examination were more likely to influ¬ 
ence the probability of positive temporal artery biopsy results 
than were historical features (Table 49-5). The presence of 
synovitis made positive temporal artery biopsy results signif¬ 
icantly less likely (LR+, 0.41). The absence of any temporal 
artery abnormality also made disease substantially less likely 
(LR, 0.53). Scalp tenderness, a finding often thought to be 
specific for TA, did not perform well as a predictor of positive 
biopsy results. Among patients in whom TA was suspected, 
the frequency of scalp tenderness was similar in patients with 
and without the disease (LR+, 1.6). 

Abnormal findings on examination of the temporal artery 
increased the probability of positive biopsy results and pre¬ 
dicted disease to a greater extent than any other variable. 
Beading, prominence, or enlargement of the temporal artery 
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Table 49-2 Characteristics of Studies That Include Patients With Both Positive and Negative Temporal Artery Biopsy Results 

Pathologic Criteria 

Diagnostic Used to Establish 


Study, y 

Quality/Level 

of Evidence 3 Study Type 

No. of 
Patients 

Positive Biopsy 
Results, No. (%) 

Referral 

Source bc 

Positive Biopsy 
Results 3 

Comments 

Gabriel etal, 29 1995 

1/3 Retrospective 

525 

172(33) 

All 

Achkar et al 30 


Hayreh etal, 31 1997 

1/3 Prospective 

363 

106 (29) 

All 

Author 


McDonnell et al, 32 

1986 

1/3 Retrospective 

250 

42 (17) 

Specialty 

Author 


Hall etal, 33 1983 

1/3 Retrospective 

134 

46 (34) 

All 

Not stated 


Fernandez-Herlihy, 34 

1988 

1/3 Retrospective 

107 

29 (27) 

All 

Author 

Omitted group C patients 
with equivocal biopsies 

Chmelewski et al, 35 
1992 

1/3 Retrospective 

98 

30(31) 

All 

Author 


Fauchald etal, 36 1972 

1/3 Retrospective 

94 

61 (65) 

All 

Not stated 

Comparison group 
patients all had PMR 

Stuart, 37 1989 

1/3 Retrospective 

75 

14(19) 

All 

Allsop and 
Gallagher 38 


Kent and Thomas, 39 
1990 

1/3 Retrospective 

70 

8(11) 

All 

Not stated 


Roth etal, 40 1984 

1/3 Retrospective 

51 

7(14) 

All 

Not stated 


Bevan etal, 41 1968 

1/4 Retrospective 

37 

28 (76) 

All 

Author 

Arteritis and giant cells 
pooled as biopsy-result 
positive 

Duhaut etal, 42 1999 

2/3 Prospective 

292 

207(71) 

All 

McDonnell et al 32 

All patients >50 y old, 

ESR >40 mm/h, response 
to 72 h of corticosteroids 

Baldursson et al, 43 

1994 

2/3 Retrospective 

133 

127(96) 

All 

ACR 


Gonzalez etal, 44 1989 

2/4 Retrospective 

21 

10(48) 

All 

Not stated 

All patients met clinical 
criteria for GCA 

Genereau et al, 45 

1999 

3/3 Retrospective 

37 

19(51) 

All 

ACR 


Vilaseca et al, 46 1987 

3/4 Retrospective 

103 

45 (44) 

All 

Allsop and 
Gallagher 38 


Gur etal, 47 1996 

3/4 Retrospective 

39 

30 (77) 

Specialty and 
PCP 

Banks et al 48 

All patients met ACR crite¬ 
ria for GCA 

Brittain et al, 49 1991 

5/4 Prospective 

31 

15(48) 

Not stated 

Not stated 


Hedges et al, 50 1983 

5/5 Retrospective 

91 

28(31) 

All 

Author 

Patients excluded if ade¬ 
quate chart documenta¬ 
tion of history-taking was 
absent 

Skaug etal, 51 1995 

6/3 Retrospective 

98 

13(13) 

Specialty 

Not stated 

All patients had eye com¬ 
plaints 

Dixon et al, 52 1966 

6/4 Prospective 

31 

13(42) 

Specialty 

Author 

All patients had PMR 


Abbreviations: ACR, 1990 American College of Rheumatology criteria for the diagnosis of giant-cell arteritis 25 ; ESR, erythrocyte sedimentation rate; GCA, giant-cell arteritis; PCP, 
primary care practices; PMR, polymyalgia rheumatica. 

“Diagnostic quality is described in Table 49-1. Levels of Evidence are those used for the Rational Clinical Examination series (Table 1 -7). 

“All, indicates all patients referred for biopsy; specialty, rheumatology or ophthalmology or other specialty practice; not stated, referral source not stated by authors. 

“Author, indicates author’s own explicitly stated criteria; not stated, no pathologic criteria stated for a positive temporal artery biopsy. 


all conferred LR+s of greater than 4. A tender temporal 
artery also suggested an increased probability of positive 
biopsy results (LR, 2.6). An absent temporal artery pulse 
showed a trend toward a useful LR+; the value of 2.7 was, 
however, not statistically different from 1. The LRs for “any 
temporal artery abnormality” may underestimate their 


power. If eligible studies did not list clinical features sepa¬ 
rately for each patient, it was not possible to determine 
whether specific temporal artery abnormalities overlapped; 
in such cases, we made the most conservative calculation 
about the actual number of patients with any temporal artery 
abnormality. 
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Table 49-3 Summary Likelihood Ratios for Symptoms Among Patients With Suspected Temporal Arteritis 



Symptom/References 

No. of Patients With 
Data on Variable 3 

LR+ (95% Cl) 

LR- (95% Cl) 


Jaw claudication 29 ' 31 ' 35 ' 37 ' 39 ' 40 ' 42 ' 4446 ' 50 52 

2314 

4.2 (2.8-6.2) 

0.72(0.65-0.81) 


Diplopia 33 - 34 ' 42 ' 50 ' 51 

703 

3.4 (1.3-8.6) 

0.95(0.91-0.99) 


Temporal headache 36 - 42 

386 

1.5(0.78-3.0) 

0.82(0.64-1.0) 


Weight loss 31 ' 34 ' 3637 ' 3641 ' 42 ' 46 ' 47 

1417 

1.3 (1.1-1.5) 

0.89(0.79-1.0) 


Anorexia 34 ' 37 - 39 - 41 ' 42 ' 46 

674 

1.2(0.96-1.4) 

0.87(0.75-1.0) 


Fatig ue 31 - 33 ' 37 ' 39 ' 41 . 42 , 44,46 

1095 

1.2(0.98-1.4) 

0.94(0.86-1.0) 


P0y0 [-29,31,34-37,40-42,46,47 

1708 

1.2 (0.98-1.4) 

0.92 (0.85-0.99) 


Any headache 29 ' 31 ' 3637 ' 39 ' 47 ' 5651 

2475 

1.2 (1.1-1.4) 

0.7 (0.57-0.85) 


Arthralgia 33 ' 34 - 37 ' 39 ' 4644 ' 46 ' 52 

582 

1.1 (0.86-1.4) 

1.0(0.92-1.1) 


Any vision symptom 29 ' 32 ' 37 ' 39 ' 42 ' 44 ' 47 ' 51 - 52 

2083 

1.1 (0.93-1.3) 

0.97(0.9-1.0) 


Polymyalgia rheumatica 29 ' 34 ' 3637 ' 39 ' 40 ' 42 ' 44 ' 45 ' 47 ' 50 

1383 

0.97 (0.76-1.2) 

0.99(0.83-1.2) 


Myalgia 31 - 36 ' 39 ' 40 ' 46 

681 

0.93(0.81-1.1) 

1.1 (0.87-1.3) 


Unilateral vision loss 32 - 50 

341 

0.85(0.58-1.2) 

1.2 (1.0-1.3) 


Vertigo 34 ' 36 ' 44 

212 

0.71 (0.38-1.3) 

1.1 (0.93-1.2) 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 
“Includes only studies that report results for patients with both positive and negative biopsy results. 


LRs approaching 1 suggest that, among patients with a 
clinical suspicion for TA, the feature was as common among 
those with positive biopsy results as it was among those with 
negative results. We separately determined the sensitivity of 
physical examination features among all studies, including 
those restricted to patients with positive biopsy results 


Table 49-4 Summary Sensitivity of Symptoms Among All Patients With 
Positive Temporal Artery Biopsy Results 3 

Variable 

No. of Studies 

Sensitivity (95% Cl) 

Any headache 

32 

0.76 (0.72-0.79) 

Temporal headache 

8 

0.52 (0.36-0.67) 

Weight loss 

19 

0.43 (0.35-0.53) 

Fever 

26 

0.42 (0.33-0.52) 

Fatigue 

19 

0.39 (0.28-0.52) 

Myalgia 

8 

0.39 (0.23-0.56) 

Any vision symptom 

35 

0.37 (0.30-0.44) 

Anorexia 

12 

0.35 (0.23-0.48) 

Polymyalgia rheumatica 

30 

0.34(0.28-0.41) 

Jaw claudication 

35 

0.34 (0.29-0.41) 

Arthralgia 

13 

0.30 (0.21-0.40) 

Unilateral vision loss 

11 

0.24(0.14-0.36) 

Facial pain 

4 

0.17(0.12-0.23) 

Bilateral vision loss 

7 

0.15(0.07-0.27) 

Vertigo 

4 

0.11 (0.05-0.19) 

Diplopia 

14 

0.09 (0.07-0.13) 


Abbreviation: Cl, confidence interval. 

“Includes results of all eligible studies, including those that reported clinical features 
for patients with positive biopsy results only. 


(sensitivity only studies, Table 49-6). In each study, physi¬ 
cians would have referred patients for a temporal artery 
biopsy when they believed the diagnosis to be sufficiently 
likely to justify a biopsy. These patients represent a selected 
sample who often manifested several clinical features of 
interest, including those analyzed in this review. Patients who 
lacked features commonly considered suggestive of TA were 
presumably less likely to have a temporal artery biopsy. This 
verification bias makes the value of those few findings with 
the highest positive LRs even greater because they help pre¬ 
dict biopsy results among patients with a significant clinical 
suspicion of disease. 

TA is more common among women than men and among 
whites than blacks. The LRs do not reflect this observation, 
perhaps because referring physicians incorporated this 
knowledge into their decisions about which patients to refer 
for biopsy. Lfowever, if one pools the data from all eligible 
studies, including those that reported only patients with pos¬ 
itive temporal artery biopsy results, TA was 2.1 times more 
common in women than men (Table 49-6). TA among black 
patients in published reports is restricted largely to small case 
series, 66 and white patients constituted 86% of all patients 
with positive biopsy results. 

Among patients referred for biopsy, the average age of 
those with positive results was 73 years; this was only 3.8 
years (95% Cl, 2.1-5.4 years) older than the average age of 
patients with negative results. Age was, however, a valuable 
criterion for predicting the likelihood of TA. When data for 
all eligible studies were reviewed, including those that 
reported only patients with positive biopsy results, 26 studies 
provided sufficient data to determine the age range of 
patients with biopsy-proven TA. Only 2 patients among a 
total of 1435 patients were younger than 50 years; this 
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Table 49-5 Summary Likelihood Ratios for Signs and Demographics and Laboratory Data Among Patients With Suspected Temporal Arteritis 

Variable/References 

No. of Patients With 

Data on Variable 2 

LR+ (95% Cl) 

LR- (95% Cl) 

Signs and Demographics 

Beaded temporal artery 42 ' 52 

323 

4.6(1.1-18.4) 

0.93 (0.88-0.99) 

Prominent or enlarged temporal artery 36 ’ 39 ' 42 ' 44 ' 52 

508 

4.3 (2.1-8.9) 

0.67 (0.5-0.89) 

Absent temporal artery pulse 41 - 52 

68 

2.7(0.55-13.4) 

0.71 (0.38-1.3) 

Tender temporal artery 36 ’ 3942 ' 50 ' 52 

755 

2.6(1.9-3.7) 

0.82 (0.74-0.92) 

Any temporal artery abnormality 293133 ' 37 ' 43 ' 46b 

1559 

2.0(1.4-3.0) 

0.53 (0.38-0.75) 

Scalp tenderness 31 ' 33 35 ' 52 

923 

1.6(1.2-2.1) 

0.93 (0.86-1.0) 

Optic atrophy or ischemic optic neuropathy 40 50 

142 

1.6(1.0-2.5) 

0.8(0.58-1.1) 

Any funduscopic abnormality 29 ’ 35 ' 50 ' 52 

745 

1.1 (0.8-1.4) 

1.0(0.92-1.1) 

White race 32 ’ 35 ' 37 ’ 40 ’ 50 

565 

1.1 (0.99-1.2) 


Male sex 29 ' 31 - 37 ' 40 - 43 ’ 45 ' 47 ' 49 ' 52 

2565 

0.83 (0.72-0.96) 


Synovitis 29 ' 37 ’ 46 ' 52 

734 

0.41 (0.23-0.72) 

1.1 (1.0-1.2) 

Laboratory Data 

ESR 

>100 mm/h 35 - 49 ' 50 

220 

1.9 (1.1-3.3) 

0.8 (0.68-0.95) 

>50 mm/h 3547 ' 49 ’ 50 

259 

1.2 (1.0-1.4) 

0.35(0.18-0.67) 

Abnormal 32 ' 37 ' 42 ’ 4649 ' 51 

941 

1.1 (1.0-1.2) 

0.2 (0.08-0.51) 

Anemia 31 . 32 . 34 . 35 . 37 . 46 . 47 . 49 

1057 

1.5(0.82-2.9) 

0.79(0.6-1.0) 


Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR-, negative likelihood ratio. 

“Includes only studies that report results for patients with both positive and negative biopsy results. 

“includes only abnormalities that are not classified more specifically by the cited studies. The true incidence of any abnormality is presumably higher but cannot be calculated 
from the primary data. 


resulted in a sensitivity of 99% for the criterion of age older 
than 50 years. This outcome suggests that clinicians should 
consider TA only as a diagnostic possibility in a person 
younger than 50 years if multiple characteristic or high- 
probability features are present. 

Accuracy of the Laboratory Evaluation 
for the Diagnosis of Temporal Arteritis 

Although the primary purpose of this analysis was to deter¬ 
mine the operating characteristics of the medical history 
and physical examination in diagnosis, clinicians usually 
obtain an ESR before determining which patients have suf¬ 
ficient likelihood of TA to justify a referral for biopsy. We 
therefore chose to evaluate the test characteristics of the 
ESR. The mean value for patients with disease was 88 mm/h; 
that for patients without disease was a mean of 10 mm/h 
lower (95% Cl, 4-25 mm/h). This difference was not statis¬ 
tically significant. 

Results of the ESR measurement were a valuable guide to 
clinicians; a low or normal level was more likely to rule out 
disease than a high value was likely to rule in disease. Previ¬ 
ously, Miller et al 72 had determined normal ESR values 
among 27912 adults without apparent disease and suggested 
defining the upper limit of normal ESR as either age/2 (for 
men) or as (age + 10)/2 (for women). In our source studies, 
authors most commonly did not define “normal” ESR; it was 
not possible to determine whether these normal values were 
adjusted for age. With this caveat, a normal ESR made TA 
unlikely; the LR for a normal ESR was 0.2 (Table 49-5). 


Table 49-6 Summary Sensitivity of Signs and Demographics and 
Laboratory Data Among All Patients With Positive Temporal Artery 

Biopsy Results 2 

No. of Studies With 

Sensitivity 

Variable 

Data on Variable 

(95% Cl) 

Signs and Demographics 

White race 

11 

0.86 (0.62-0.97) 

Any temporal artery abnormality 

16 

0.65 (0.54-0.74) 

Prominent or enlarged 
temporal artery 

6 

0.47 (0.40-0.54) 

Absent temporal artery pulse 

6 

0.45 (0.26-0.66) 

Tender temporal artery 

13 

0.41 (0.30-0.52) 

Male sex 

40 

0.32 (0.29-0.35) 

Any funduscopic abnormality 

6 

0.31 (0.14-0.54) 

Scalp tenderness 

13 

0.31 (0.20-0.44) 

Optic atrophy or ischemic optic 
neuropathy 

4 

0.29 (0.10-0.57) 

Beaded temporal artery 

3 

0.16(0.07-0.28) 

Laboratory Data 

ESR 

Abnormal 

24 

0.96 (0.93-0.97) 

>50 mm/h 

14 

0.83 (0.75-0.90) 

>100 mm/h 

10 

0.39 (0.29-0.50) 

Anemia 

22 

0.44 (0.34-0.54) 


Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate. 

“Includes results of all eligible studies, including those that reported clinical features 
for patients with positive biopsy results only. 
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When we separately analyzed the pooled data from all stud¬ 
ies, only 4% of patients with positive temporal artery biopsy 
results and data on ESR had a normal value. If one uses a less 
strict cutoff point, even an ESR of less than 50 mm/h sub¬ 
stantially reduces the probability of disease (LR, 0.35). This 
value is lower than the LR- of any symptom or sign. 

In contrast to clinical lore, a high ESR was less useful in 
identifying those with TA among all patients referred for 
biopsy, which likely relates to the verification bias inherent in 
patient selection for the eligible studies because referring 
physicians would have had knowledge of the ESR before rec¬ 
ommending a biopsy. Although an ESR of greater than 100 
mm/h conferred an LR+ of 1.9, this value is less than the 
most useful symptoms and signs. In contrast, mean ESR val¬ 
ues were similar for patients with and without positive tem¬ 
poral artery biopsy results. 

Anemia was present in 44% of patients with biopsy-proven 
TA. This finding was present in a similar number of patients 
who had negative biopsy results. Mean hemoglobin levels 
were similar between patients with positive and negative 
biopsy results (11.6 g/dL vs 12.4 g/dL, respectively); the lack 
of anemia was not helpful in ruling out disease. 

ARE THESE CLINICAL FEATURES EVER NORMAL? 

The presence of particular symptoms or signs in patients 
with negative temporal artery biopsy results does not imply 
that these findings are “normal” or common in patients 
without disease. Rather, it suggests that other conditions that 
clinicians may initially confuse for TA have overlapping clini¬ 
cal features. The frequency of such findings in randomly 
selected individuals of the same age would likely be lower 
than the frequency among patients in this review with nega¬ 
tive biopsy results. 

Several studies have followed patients with negative biopsy 
results to determine their ultimate or correct diagnoses. 
Chmelewski et al 35 reported the outcomes of 98 patients 
undergoing temporal artery biopsies during a 5-year period at 
their institution. Among the 68 patients with negative biopsy 
results, 15 proved to have neurologic disorders (including 
migraine, stroke, and optic neuropathy), 14 had PMR, 10 had 
other rheumatologic disorders (including vasculitis other than 
TA, rheumatoid arthritis, and CREST [calcinosis, Raynaud 
disease, esophageal dysmotility, sclerodactyly, telangiectasia] 
syndrome), and 4 had fever of unknown origin. Miscellaneous 
diagnoses included sinusitis, endocarditis, amyloidosis, and 
malignancy. In another biopsy series, Roth et al 40 studied 33 
patients with a clinical suspicion of TA but negative biopsy 
results. The most common diagnoses, in descending order, 
were joint disease (degenerative or rheumatoid), malignant 
lymphoma, arteriosclerotic carotid artery disease, diabetes 
mellitus, and ischemic optic neuropathy. 

In our first clinical scenario, the history of bitemporal 
headache and a modestly increased ESR would be among 
those factors that may lead a clinician to suspect TA. In this 
setting, one would seek the potential additional history of 
jaw claudication or diplopia and determine the presence of a 


prominent, tender, or beaded temporal artery. If present, 
these factors would substantially increase the likelihood of 
positive temporal artery biopsy results. 

In the second scenario, TA is among the diagnostic con¬ 
siderations for transient partial monocular vision loss in the 
setting of a constitutional illness. The history in this case is 
sufficiently compelling to justify a temporal artery biopsy. 
Given the high prior probability and the poor performance 
of historical and examination features in excluding disease, 
an otherwise normal medical history and physical examina¬ 
tion result would not sufficiently reduce the likelihood of TA 
to avoid the need for a temporal artery biopsy. A normal 
ESR would, however, reduce the likelihood of disease by a 
factor of 0.2 and should prompt consideration of alternative 
diagnoses. 

THE BOTTOM LINE 

Available data suggest that many of the clinical features com¬ 
monly found in patients with the disease are unhelpful in 
predicting the likelihood of positive temporal artery biopsy 
results. Our study evaluates the predictive value of clinical 
features among patients who are already clinically suspected 
of having the disease, as determined by the clinicians who 
referred them for biopsy. Although we could not determine, 
from the primary studies, the factors that went into the deci¬ 
sion to refer for biopsy, certain clinical features modified the 
likelihood of disease among these patients. It is likely that 
these same clinical factors would be useful to consider at ini¬ 
tial evaluation, even before the decision to proceed to biopsy. 
In addition, the verification bias inherent in this analysis 
makes the significance of our results greater because they 
help to predict biopsy results even among patients who have 
a higher prior probability of disease than do unselected 
patients with any particular clinical feature. 

When a medical history is taken in a patient with possible 
TA, jaw claudication and diplopia substantially increase the 
probability of positive biopsy results (LR+s, 4.2 and 3.4, 
respectively). No symptoms help rule out the diagnosis by 
their absence. Among physical examination findings, synovi¬ 
tis makes the diagnosis of TA less likely, whereas beaded, 
prominent, enlarged, and tender temporal arteries increase 
the likelihood of positive biopsy results. Beaded, prominent, 
or enlarged arteries confer the highest positive LRs of any 
clinical or laboratory feature and substantially increase the 
probability that a patient with suspected TA will have posi¬ 
tive biopsy results. Although these findings increase the 
chance of having TA, they are variably sensitive, from 16% 
(beaded temporal artery) to 65% (any temporal artery 
abnormality). 

The results of tests of ESR alter the likelihood of positive 
biopsy results. A normal ESR (LR, 0.2) or ESR less than 50 
mm/h (LR, 0.35) makes positive biopsy results less likely, but 
setting the ESR threshold at 100 mm/h is less efficient 
because patients with an ESR less than 100 mm/h have an LR 
(0.8) that only slightly decreases the likelihood of disease. 
Among patients clinically suspected of having disease, those 
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with an ESR greater than 100 mm/h have a modestly 
increased likelihood of biopsy-proven TA (LR, 1.9). 

The clinician faced with a patient who may have TA has a 
difficult diagnostic challenge. The goal is to rule out other 
morbid conditions that may mimic TA, to avoid unnecessary 
evaluation, and to quickly and correctly identify and treat 
patients who do in fact have the disorder. Given the extreme 
difference in prevalence of TA between the general popula¬ 
tion (<1%) vs those referred for temporal artery biopsy 
(39%), we infer that clinicians are adept at identifying 
patients at high risk for disease. Many clinicians choose to 
treat patients they have referred for biopsy with corticoster¬ 
oids, in the absence of contraindication, pending biopsy 
results. Although this strategy would appear particularly wise 
in the presence of a factor that we have shown predicts likeli¬ 
hood of disease, this approach deserves further study. 

Our review of clinical series of patients with suspected TA 
does not allow a determination of the predictive value of 
selected combinations of clinical and laboratory features. In 
addition, it is not possible to determine from our data 
whether certain combinations of features would sufficiently 
increase the likelihood of disease that a clinician should treat 
presumptively for TA and not perform a biopsy at all. The 
morbidity of a prolonged course of corticosteroids, however, 
is such that most clinicians would favor confirmation of dis¬ 
ease by biopsy even if the clinical probability is high. 

Our analysis demonstrates that a limited number of clini¬ 
cal features substantially modify the probability of the diag¬ 
nosis of TA among patients suspected of having the disease. 
Ultimately, the clinician must integrate multiple clinical fac¬ 
tors to optimize diagnostic and therapeutic strategies for 
patients with suspected TA. 
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UPDATE: 


CLINICAL SCENARIO 


A 72-year-old woman whom you have treated for the past 
decade comes to see you out of concern for her new-onset 
headaches. During the past month, her headaches have 
largely been stable, without progression. She has been 
tired but volunteers no other associated symptoms. On 
further questioning, she says that the pain occurs bilater¬ 
ally in the temporal and occipital areas. She denies jaw 
claudication but does report scalp tenderness. She has had 
no vision loss. On physical examination, she has pulseless, 
nontender temporal arteries. 

How do her symptoms influence your decision to pursue 
a diagnosis of temporal arteritis? Do the physical findings 
help with your assessment? What is her probability of hav¬ 
ing temporal arteritis and a positive temporal artery biopsy 
result? How much will an erythrocyte sedimentation rate 
(ESR) change the likelihood of temporal arteritis? 

UPDATED SUMMARY ON TEMPORAL ARTERITIS 

Original Review 

Smetana GW, Shmerling RH. Does this patient have tempo¬ 
ral arteritis? JAMA. 2002;287(1):92-101. 

UPDATED LITERATURE SEARCH 

We performed an updated MEDLINE literature search 
from January 2000 to August 2004, using the same search 
strategy as in our original publication. After reviewing the 
titles and abstracts, we identified 48 potential new relevant 
articles. We included studies with an emphasis on com¬ 
monly available laboratory tests because the interpretation 
of the results is always tightly coupled to the clinical evalua¬ 
tion for temporal arteritis. We then applied the inclusion 
and exclusion criteria of our original review. After a 
detailed review of each retrieved article, 5 articles met the 
inclusion criteria. Most excluded studies contained no 
detailed clinical information about historical or physical 
examination features or failed to require that at least 90% 
of patients in the temporal arteritis group have a positive 
temporal artery biopsy result. An additional 10 articles did 
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not meet our criteria but contained useful background 
material for our discussion. 


NEW FINDINGS 

• Combinations of clinical findings are much more powerful 
at assessing the likelihood of temporal arteritis than indi¬ 
vidual findings, especially jaw claudication with vision 
change. 

• The diagnostic value of an increased ESR increases with 
increasing patient age. 

Details of the Update 

Since publication of our original review, 5 additional studies 
have provided evidence on the value of the clinical examina¬ 
tion in predicting temporal artery biopsy results among 
patients suspected of having the disease. 

Mirroring decision-making in clinical practice, combina¬ 
tions of clinical findings create more clinically important 
changes in the likelihood of temporal arteritis than individual 
findings. For example, in a study by Younge et al, 1 the combi¬ 
nation of jaw claudication and decreased vision was associ¬ 
ated with a positive likelihood ratio (LR+) of 44, whereas 
either finding alone had much lower LR+s. A multivariate 
model allows better assessment of the ESR in relation to the 
patient’s age and symptoms. For example, a new headache 
but normal ESR in a 72-year-old patient is associated with a 
risk of disease of 12%, but the probability of temporal arteri¬ 
tis increases to 78% when jaw claudication and scalp tender¬ 
ness occur together. Thus, despite a low likelihood ratio (LR) 
for a normal ESR (LR, 0.20), a normal ESR can be out¬ 
weighed by the presence of other factors. 

A platelet count greater than 400 x 10 3 /|_lL increases the 
probability of a positive temporal artery biopsy result among 
patients suspected of having the disease. In a study in which 
two-thirds of 91 patients reported vision symptoms, the LR+ 
was 6.3 (confidence interval [Cl], 2.4-17) for platelet count 
greater than 400 x lOVjiL. 2 However, a second study, using a 
lower threshold for the platelet count, found an LR+ of 1.6 (Cl, 
1.3-1.9). We did not report results of platelet counts in our data 
abstraction in our original systematic review, because it was 
not possible to construct LRs from the primary data. However, 
a multivariate model 3 revealed that the platelet count did not 
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Table 49-7 Summary Likelihood Ratios of Symptoms, Signs, and the 
Erythrocyte Sedimentation Rate for Temporal Arteritis 

Finding (No. of Studies) 

LR+ (95% Cl) 

LR- (95% Cl) 

Jaw claudication (17) 

4.3 (3.0-6.1) 

0.72 (0.66-0.79) 

Diplopia (5) 

3.5(1.8-6.8) 

0.96 (0.93-0.99) 

Scalp tenderness (8) 

1.7(14-2.4) 

0.73 (0.66-0.82) 

Any headache (19) 

1.7(1.54.9) 

0.67 (0.56-0.80) 

Any vision symptoms (19) 

1.1 (0.94-1.3) 

0.97(0.92-1.0) 

"Abnormal” ESR“ (7) 

1.1 (1.0-1.2) 

0.2 (0.08-0.51) 

ESR >100 (4) 

1.9(14-3.3) 


ESR 50-100(5) 

1.1 (0.87-1.5) 


ESR < 50 (5) 

0.55 (0.38-0.80) 



Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, pos¬ 
itive likelihood ratio; LR-, negative likelihood ratio. 

“An abnormal ESR was defined by the laboratory analyses of the individual studies. 
From these data, a normal ESR has a likelihood ratio of 0.2 for temporal arteritis. 

add additional information to the other variables. Future stud¬ 
ies should reassess the role of the platelet count as a screening 
test for temporal arteritis among patients with compatible 
symptoms, especially those with vision complaints. 

One retrospective review 4 assessed the ethnic background 
among patients with biopsy-proven temporal arteritis. 
None of the 40 Hispanic patients in the United States referred 
for temporal artery biopsy had positive results. A study 
from a tertiary hospital in Spain 5 showed very few differences 
between these patients compared to the summary data from the 
original Rational Clinical Examination article. A smaller study of 
patients in the United Kingdom 6 also showed values similar to 
those in the original Rational Clinical Examination article. The 
low prevalence in this population should be studied in future 
case series. 

The strict inclusion criteria of our original review required 
primary data from clinical series, excluding decision analyses 
from consideration because decision analyses require assump¬ 
tions about the prevalence of disease and differing clinical fea¬ 
tures. The published decision analyses preceding our review 
did not have access to a systematic estimation of these values. 
However, they provide an alternative strategy about manage¬ 


Box 49-1 Temporal Arteritis Score (for Patients > 50 y) 

Score = -240 + 48 x (headache) + 108 x (jaw claudica¬ 
tion) + 56 x (scalp tenderness) + 1.0 x (ESR) + 70 x 
(ischemic optic neuropathy) + 1.0 x (age) 

(If symptom present, substitute 1.0; if negative, substitute 0) 
Estimated probability = [exp (score,50) ]/[l + exp <score,50) ] 

If score less than -110, low risk (<10% chance of positive 
biopsy result) 

If score = -110 to 70, intermediate risk (10%-80% chance 
of positive biopsy result) 

If score > 70, high risk (>80% chance of positive biopsy 
result) 


ment of patients suspected of having temporal arteritis. We 
identified 3 such studies 7-9 through our literature search. Not 
surprisingly, using different assumptions, these authors devel¬ 
oped differing predictive models. Each study modeled empiric 
treatment strategies, treatment guided by biopsy results, and 
treatment of all patients irrespective of biopsy results. The 
model results changed with differing estimated prior probabil¬ 
ity of disease. None of these studies, however, estimated the 
influence of particular clinical or laboratory features on the 
likelihood of positive biopsy results. Therefore, these provide a 
complementary analysis but do not add to the information in 
our review or update on The Rational Clinical Examination. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

New data allowed us to refine our summary estimates for the 
LRs of clinical features for temporal arteritis. None of the 
estimates changed appreciably, although the new data gener¬ 
ally led to narrower CIs and, therefore, more confidence in 
the role of each finding. 

CHANGES IN THE REFERENCE STANDARD 

The reference standard for the diagnosis of temporal arteritis 
remains a temporal artery biopsy. 

RESULTS OF LITERATURE REVIEW 

details univariate analyses of clinical variables 
associated with temporal arteritis. As in the original meta¬ 
analysis, the presence of jaw claudication or diplopia was 
associated with the highest LRs. For decreasing the likelihood 
of temporal arteritis, a normal ESR has the lowest LR. 

Multivariate Findings for Temporal Arteritis 

Younge et al 1 developed a temporal arteritis score, shown in 
, that estimates the probability of temporal arteritis 
according to the presence of 6 factors. 

The authors derived this score from a large sample of 1113 
patients undergoing temporal artery biopsy, all of whom 
were older than 50 years. This is the largest series in the liter¬ 
ature that includes patients undergoing temporal artery 
biopsy with both positive and negative biopsy results (the 
entire literature from 1966 to 2000 includes only 2680 
patients). We were unable to determine the value of combi¬ 
nations of clinical features in our original review because of 
the limitations of the meta-analytic design and the lack of 
individual patient specific data. The temporal arteritis score 
of Younge et al 1 is an important contribution that assists cli¬ 
nicians in estimating the likelihood of temporal arteritis 
among patients suspected of having the disease. However, it 
was derived from a group of patients who were older than 50 
years, and its use should be limited to people of similar age. 
Prospective validation studies are necessary, but the large 
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Box 49-2 American College of Rheumatology Criteria 10 for 

Temporal Arteritis 

1. Age at disease onset at least 50 y 

2. New headache 

3. Temporal artery abnormality (tenderness, diminished 
pulsation unrelated to atherosclerosis of cervical 
arteries) 

4. Increased erythrocyte sedimentation rate (at least 
50 mm/h by Westergren method) 

5. Abnormal artery biopsy result (vasculitis with mono¬ 
nuclear cell predominance or granulomatous inflam¬ 
mation, usually with multinucleated giant cells) 


patient sample provides some reassurance to clinicians who 
choose to apply the score to their patients. 

EVIDENCE FROM GUIDELINES 

There are no well-established consensus guidelines for the 
evaluation, diagnosis, or treatment of patients with sus¬ 
pected or proven temporal arteritis. Clinicians and research¬ 
ers generally agree on the American College of Rheumatology 
(ACR) criteria for the classification of giant-cell (temporal) 
arteritis. 10 These criteria were described as “classification” cri¬ 
teria (rather than “diagnostic”) to make their purpose clear: 
they are best used among patients with vasculitis to improve 
standardization and comparability of studies, not necessarily 
as diagnostic criteria for clinical practice. They are repro¬ 
duced in Box 49-2. 

The ACR criteria for all vasculitis syndromes, including 
temporal arteritis, have been criticized for poor predictive 
value when applied to individual patients in clinical prac¬ 
tice. 11 However, other guidelines have not been widely 
accepted. 12 All of these guidelines use clinical factors pre¬ 
sented in the original and updated literature reviews. Because 
there is no clear consensus about the definition or gold stan¬ 
dard for the diagnosis of temporal arteritis beyond a positive 
temporal artery biopsy result, in our meta-analysis we 
required at least 90% of individuals considered to have the 
disease to have histologic “proof.” 


CLINICAL SCENARIO—RESOLUTION 


Our 72-year-old woman has a new onset of temporal and 
occipital headache that raises the possibility of temporal 
arteritis. One should seek the presence of those features 
that confer a high LR+, including diplopia and jaw claudi¬ 
cation. In her case, scalp tenderness is present (LR+, 1.7), 
but she does not have other historical features that confer 
a high LR+. On examination, one looks for the presence 
of beaded, tender, or pulseless temporal arteries. Her 
pulseless temporal arteries confer an LR+ of 2.7, but the 
Cl around this result is broad (95% Cl, 0.55-13). 

An ESR measurement would be helpful: a normal ESR 
confers an LR of 0.2, whereas an elevated ESR greater than 
100 mm/h increases the likelihood of disease (LR, 1.9). Inter¬ 
mediate ESR values, that is, values that are elevated but less 
than 100 mm/h, occur commonly in patients with temporal 
arteritis) and would increase the likelihood to a lesser degree. 

The temporal artery score of Younge et al 1 provides an 
alternate strategy for estimating disease risk by combining 
the most important clinical features. If we enter the data 
for our patient into this prediction rule, using hypotheti¬ 
cal ESR values of 50 and 100, we obtain the following 
results. 

For ESR = 50: Score = -240 + 48 x (headache = 1) 

+ 108 x (jaw claudication = 0) + 56 x (scalp tenderness = 1) 
+ 1.0 x (ESR = 50) + 70 x (ischemic optic neuropathy = 0) 
+ 1.0 x (age = 72) 

Score = -14. Intermediate risk (probability, 43%) 

For ESR = 100: Score = -240 + 48 x (headache = 1) 

+ 108 x (jaw claudication = 0) + 56 x (scalp tenderness = 1) 
+ 1.0 x (ESR = 100) + 70 x (ischemic optic neuropathy = 0) 
+ 1.0 x (age = 72) 

Score = 36. Intermediate risk (probability, 67%) 

In this case, using the prediction rule of Young et al, 1 the 
risk is intermediate according to clinical evaluation. The ESR 
results do not modify the likelihood of temporal arteritis, as 
determined by clinical evaluation alone. After this evaluation, 
temporal arteritis is still a consideration. Previous studies 
and clinical experience suggest that biopsy should be per¬ 
formed in 7 to 10 days, although the yield of biopsy decreases 
over time after the initiation of corticosteroid treatment. 
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TEMPORAL ARTERITIS— MAKE THE DIAGNOSIS 


Table 49-8 The Single Best Findings or Combinations of Findings 

Temporal arteritis is relatively rare, though the disease may Can Be used to Estimate the Probability of Temporal Arteritis 
be underdiagnosed. 13 The prevalence increases with age, and 

it occurs more commonly among women and whites. One ER+ ^ ^ ^ ER (95 /» Cl) 

study found that, among white persons 50 years and older, single Best Findings Suggesting the Presence of Temporal Arteritis 

the prevalence of temporal arteritis was 200 cases per Jaw claudication 4.3 (3.0-6.1) 

100000; among persons older than 85 years, the prevalence Diplopia 3 5 (1 8-6 8) 

was 1100 per 100000.“ Most published senes have been from sing|e Best Rndjng Suggesting the Absence of Tempora , Arteritis 

northern Europe and the northern United States, but the dis- ESR < 50 mm/h (n = 5) 0 .55 (0.38-0.80) 

ccioc llclj Lfvvll woov. i v cn vv v _/1 xci vVTOv■ 

Posterior 

POPULATION FOR WHOM TEMPORAL ARTERITIS Combinations of Findings 1 * Probability, % 

DISEASE SHOULD BE CONSIDERED Headache + jaw claudication + scalp tenderness at age 60 y 65 

Temporal arteritis should be considered in all adults aged 50 Headache + jaw claudication + scalp tenderness at age 80 y 74 

years and older with appropriate symptoms. Although prev- Headache + jaw claudication + scalp tenderness at age 84 

alence varies by sex, race, and geographic locale, no single A ESR = 50 mm/h 

demographic factor among persons older than 50 years Headache + jaw claudication + scalp tenderness at age 88 

decreases the likelihood enough to exclude the diagnosis. y ’ EER — ^ mmyb 

No headache + no jaw claudication + no scalp tender- 7 

DETECTING THE LIKELIHOOD OF ness at age 60 y ’ ESR = 50 mm/h 

TEMPORAL ARTERITIS No headache + no jaw claudication + no scalp tender- 10 

^ ness at age 80 y, ESR = 50 mm/h 

using either single features (and applying the summary LRs Abbreviations: confidence interval; ESR, erythrocyte sedimentation rate; LR+, 
r . . , .... ' positive likelihood ratio; LR-, negative likelihood ratio; OR, odds ratio, 

from our meta-analysis) or by using combinations of fea- aThese are examp|es of various combinations of findings for patients with 3 of 3 

tures, as established by the prediction rule of Younge et al 1 symptoms vs 0 of 3 symptoms present at various ages. The addition of age and ESR 

(see Table 49-8). provides important information when combined with the symptoms. 

REFERENCE STANDARD TESTS 

Temporal artery biopsy and histologic evaluation is the ref- diagnostically useful in the diagnosis of temporal arteri- 

erence standard for the diagnosis of temporal arteritis, tis, studies to date have not provided sufficient, conclusive evi- 

Other means of diagnosis have been suggested, including dence confirming the diagnostic value of these tests beyond 

positron emission tomography scanning 15,16 and ultrason- standard clinical information (including medical history, physi¬ 
ography 17-22 for imaging of the temporal artery. Although cal examination, and routine measures of inflammation) and 

results of small studies have been promising, studies of these biopsy as alternative reference standards. Magnetic resonance 

tests have been flawed (primarily by incomplete evaluation angiography, computed tomography, or standard angiography 

against the gold standard, temporal artery biopsy) and are can be helpful for extracranial disease, including inflammatory 

not widely accepted. Although they could at some point prove involvement of the aorta or its proximal branches. 22 
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TITLE Thrombocytosis in Patients With Biopsy-Proven 
Giant Cell Arteritis. 

AUTHORS Foroozan R, Danesh-Meyer H, Savino PJ, 
Gamble G, Mekari-Sabbagh ON, Sergott RC. 

CITATION Ophthalmology. 2002;109(7):1267-1271. 

QUESTION Are the complete blood cell (CBC) count 
and erythrocyte sedimentation rate (ESR) useful in pre¬ 
dicting positive temporal artery biopsy results among 
patients suspected of having giant-cell arteritis (GCA)? 

DESIGN Retrospective, case-control series. 

SETTING Specialty eye hospital in Philadelphia, Penn¬ 
sylvania. 

PATIENTS Ninety -one consecutive patients undergo¬ 
ing temporal artery biopsy for suspicion of GCA; biopsy 
performed within 1 week of presentation. Corticosteroid 
therapy before biopsy was not allowed; blood tests were 
conducted within 24 hours of biopsy. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Diagnostic (gold) standard was temporal artery biopsy; tests 
included CBC count and Westergren ESR. Definition of ele¬ 
vated platelet levels (>400 x 10 3 /(J.L) was based on reference 
range greater than 2 SD above the mean; elevated ESR was 
above age/2 for men and (age + 10)/2 for women. No patients 
had a clinical course to suggest biopsy-negative GCA. 

MAIN OUTCOME MEASURES 

Means, sensitivity, specificity, and likelihood ratios (LRs). 

MAIN RESULTS 

Forty-seven patients had a positive biopsy result; 44 had neg¬ 
ative biopsy result. 

White blood cell counts were no different between patients 
with positive and negative biopsy results, although patients 


with positive biopsy results were significantly more anemic. 
Among patients suspected of having temporal arteritis, 
thrombocytosis significantly predicts the likelihood of a posi¬ 
tive temporal artery biopsy result (see bles 49-9 and 49-10). 

CONCLUSION 

LEVEL OF EVIDENCE Level 3 (using criteria from original 
review). 

STRENGTHS The investigators asked a unique question 
regarding the value of laboratory testing to stratify probabil¬ 
ity of disease. 

LIMITATIONS All patients were evaluated at a subspecialty 
ophthalmology clinic. The sample size was small. 


Table 49-9 Comparison of Laboratory Values Between Those With 
Positive vs Negative Biopsy Results for Giant-Cell Arteritis 

Biopsy Result Biopsy Result 


Test 

Positive 

Negative 

P Value 

Mean ESR level, mm/h 

82 

70 

.12 

Mean hematocrit level, % 

34.8 

37 

.03 

Mean hemoglobin level, g/dL 

11.7 

12.5 

.01 

Mean platelet count, xl 0 3 /pL 

433 

277 

<.001 

Abbreviation: ESR, erythrocyte sedimentation rate. 


Table 49-10 Likelihood Ratios of Laboratory Values for Giant-Cell 
Arteritis 

Test 

Sensitivity Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

ESR 

0.79 0.27 

1.1 (0.86-1.4) 

0.78(0.37-1.6) 

Platelet count > 
400 x 10 3 /pL 

0.57 0.91 

6.3(2.4-17) 

0.47 (0.33-0.66) 

Combination of 
ESR and plate¬ 
let count > 400 
x10 3 /pL 

0.51 0.91 

5.6(2.1-15) 

0.54 (0.40-0.73) 


Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, 
positive likelihood ratio; LR-, negative likelihood ratio. 
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Commentary 

This study was performed with high quality, although it was ret¬ 
rospective and selected patients were treated at a specialty eye 
hospital. Two-thirds of the patients had primarily visual com¬ 
plaints. The results suggest that elevated platelet count may be 
useful in suggesting the diagnosis of GCA, but LRs may not be 
helpful enough to preclude biopsy or rule out the need for one. 
Also, the marginal value of elevated platelet count beyond ele¬ 
ments of the medical history, physical examination, and other 
routine laboratory tests (especially lack of normal ESR) may be 
small. The authors suggest that platelet count maybe better than 
ESR in predicting results of biopsy, in part because an elevation 
in ESR is part of what goes into the decision to get a biopsy. 
Elowever, the definition of elevated ESR (age and sex adjusted) 
was more restrictive in this study than in many others and may 
have lessened its predictive power. This study does not examine 
the value of history-taking or physical examination findings. 

Reviewed by Robert H. Shmerling, MD 


TITLE Influence of Age, Sex, and Place of Residence on 
Clinical Expression of Giant-Cell Arteritis in Northwest 
Spain. 

AUTHORS Gonzalez-Gay MA, Garcia-Porrua C, Amor- 
Dorado JC, Llorca J. 

CITATION / Rheumatol. 2003;30(7): 1548-1551. 

QUESTION Do age, sex, and urban residence influence 
the clinical expression of giant-cell arteritis? 

DESIGN Retrospective chart review. 

SETTING Tertiary referral hospital in northwestern 
Spain that is the only referral center for a mixed urban and 
rural area encompassing approximately 250000 people. 

PATIENTS All patients with biopsy-proven giant-cell 
arteritis between 1981 and 2001. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Clinical and laboratory features of patients with biopsy-proven 
giant-cell arteritis represented the diagnostic tests. The diag¬ 
nostic standard was a positive temporal artery biopsy. 

MAIN OUTCOME MEASURES 

The main outcome measure was sensitivity. 

MAIN RESULTS 

Few differences exist in the clinical presentation of biopsy- 
proven giant-cell arteritis according to age, sex, and place of 


residence. The only clinical sex-based difference is a higher 
prevalence of polymyalgia rheumatica in women (see 

). Women had a statistically significantly lower hemoglo¬ 
bin level than men. No clinical features differed for urban- or 
rural-dwelling patients. Age of onset at presentation did not 
significantly influence the clinical presentation. A trend existed 
toward more polymyalgia rheumatica in younger patients, but 
this difference was not significant. Hemoglobin levels were 
minimally lower in younger patients, and more older patients 
had an increased alkaline phosphatase level. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4 (using criteria from the origi¬ 
nal review). 

STRENGTHS Consistent data set across all patients. 

LIMITATIONS The study population consisted only of 
patients with giant-cell arteritis. The relatively small 


Table 49-11 Most Presenting Features of Giant-Cell Arteritis Are 
Similar Between Men vs Women and Patients Younger Than 70 Years 
vs Older Than 70 Years 


Variable 

Men 
(n = 97) 

Women 
(n = 113) 

Onset <70 y 
of Age 
(n = 42) 

Onset >70 y 
of Age 
(n = 168) 

Men, % 



48 

46 

Age at diagnosis, y 

75 

75 



Living in urban area, % 

27 a 

46 a 

31 

39 

Delay to diagnosis, wk 

9.7 

11 

12 

9.9 

Headache, % 

90 

85 

88 

87 

Scalp tenderness, % 

34 

34 

26 

36 

Constitutional syndrome, % 

67 

62 

76 

61 

Abnormal temporal artery 
examination, % 

73 

78 

67 

77 

Jaw claudication, % 

36 

45 

29 

44 

Dysphagia, % 

3 

7 

0 

7 

Polymyalgia rheumatica, % 

33 a 

49 a 

52 

39 

Fever, % 

8 

11 

12 

9 

Visual manifestations, % 

26 

21 

21 

24 

Permanent visual loss, % 

13 

12 

12 

13 

Cerebrovascular 
accident, % 

3 

1 

5 

1 

Limb claudication of recent 
onset, % 

4 

2 

7 

2 

ESR, mean, mm/h 

91 

95 

100 

92 

Hemoglobin, mean, g/dL 

12.2 a 

11.4 a 

11.3 s 

11.9 a 

Platelet count, mean, 
x10 3 /pL 

407 

412 

437 

402 

Increased alkaline phos¬ 
phatase, % 

26 

28 

48 a 

22 a 


Abbreviation: ESR, erythrocyte sedimentation rate. 

“P < .05 for comparison between men and women or between younger and older patients. 
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number of study subjects limited the power to detect signif¬ 
icant differences. 

Commentary 

This case series provides a detailed summary of clinical and lab¬ 
oratory features among a cohort of patients with biopsy-proven 
giant-cell arteritis in Spain. The overall prevalence of specific 
features is similar to that reported in our original review and 
meta-analysis. Differences include higher incidences of head¬ 
ache and polymyalgia rheumatica and lower incidences of fever 
and visual manifestations than in our original review. In this 
study, the authors aimed to identify differences in clinical pre¬ 
sentations according to age, sex, and urban location. Remark¬ 
ably, nearly all features were similar across these patient subsets. 
The only clinical feature that was statistically significantly differ¬ 
ent across nearly 60 comparisons was the greater incidence of 
polymyalgia rheumatica among women compared with men. 
However, this series may have lacked sufficient statistical power 
to detect significant differences. 

Small differences in hemoglobin and the incidence of ele¬ 
vated alkaline phosphatase level existed in these comparisons, 
but these are not clinically significant. We have previously 
shown that anemia does not predict positive biopsy results 
among patients suspected of having the disease (positive likeli¬ 
hood ratio, 1.5 [95% confidence interval, 0.82-2.9]; negative 
likelihood ratio, 0.79 [95% confidence interval, 0.6-1.0]). This 
study suggests that clinical suspicion and the value of particu¬ 
lar clinical features of giant-cell arteritis do not differ among 
these selected patient subsets. 

Reviewed by Gerald W. Smetana, MD 


TITLE The Epidemiology of Giant Cell Arteritis: A 12- 
Year Retrospective Review. 

AUTHORS Liu NH, LaBree LD, Feldon SE, Rao NA. 

CITATION Ophthalmology. 2001;108(6):1145-1149. 

QUESTION What is the incidence of biopsy-proven 
giant-cell arteritis among individuals of Hispanic descent? 

DESIGN Retrospective chart review. 

SETTING Subspecialty academic ophthalmology institute 
in the United States. 

PATIENTS Sequential patients (n = 121) undergoing 
temporal artery biopsy. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The diagnostic tests were demographic factors including age, 
sex, and ethnicity. The diagnostic standard was a temporal 
artery biopsy. The authors explicitly stated the pathologic 
criteria used to classify a temporal artery biopsy result as pos¬ 
itive. 


Table 49-12 The Incidence of Giant-Cell Arteritis Differs by Race 

Race (No.) 

Positive Biopsy Result, % 

OR (95% Cl) 

White (66) 

40 

22(3.6-133) 

Black (6) 

0 

0 (0-4.2) 

Hispanic (40) 

0 

0 (0-0.38) 


Asian (9) 12 0.61(0.09-4.0) 

Abbreviations: Cl, confidence interval; OR, odds ratio. 


MAIN OUTCOME MEASURES 

Incidence of temporal arteritis among white, Asian, black, 
and Hispanic patients undergoing temporal artery biopsy. 
Hispanic patients self-reported whether they considered 
themselves to be of white or Latino descent. 


MAIN RESULTS 

Twenty patients (16.5%) had positive temporal artery biopsy 
results. The mean age of the study population was 70 ± 8.8 
years. White patients were older than Asian, black, and His¬ 
panic patients. The mean age for patients with a positive 
biopsy result was 75 years, whereas that for patients with a 
negative biopsy result was 69 years. Giant-cell arteritis is rare 
among a population of Americans of Hispanic ethnicity 
(Table 49-12). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 (using criteria from the origi¬ 
nal review). 

STRENGTHS Asked a unique question not previously 
addressed in the literature. 

LIMITATIONS No clinical information was recorded and 
only demographic and laboratory variables were studied. 

Commentary 

The original review reconfirmed the observation that tem¬ 
poral arteritis is predominantly a disease of whites. Among 
all eligible studies in that review, 86% of all patients with 
positive biopsy results were white. Descriptions of blacks 
with temporal arteritis have been largely restricted to case 
reports and small series. The incidence among US Hispan- 
ics has not been well studied. In this report, the authors 
determined the race of all patients undergoing temporal 
artery biopsy at a referral ophthalmology center in Los 
Angeles, California. Although Hispanics constituted 33% of 
all patients referred for biopsy, not a single biopsy result 
was positive in this group of patients (95% confidence 
interval, 0%-7.2%). 

Reviewed by Gerald W. Smetana, MD 
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TITLE Predictive Clinical and Laboratory Factors in the 
Diagnosis of Temporal Arteritis. 

AUTHORS Mohamed MS, Bates T. 

CITATION Ann R Coll Surg. 2002;84(l):7-9. 

QUESTION Among patients undergoing temporal 
artery biopsy, which clinical and laboratory factors pre¬ 
dict positive biopsy results? 

DESIGN Retrospective chart review. 

SETTING Single hospital in the United Kingdom. 

PATIENTS All patients (n = 50) who underwent tempo¬ 
ral artery biopsy between January 1988 and December 1997. 

DESCRIPTION OF THE TEST AND 
DIAGNOSTIC STANDARD 

The diagnostic tests were demographic features, presenting 
clinical features, laboratory investigation, and the duration of 
corticosteroid therapy before biopsy. The diagnostic standard 
was a temporal artery biopsy. The authors did not state the 
criteria used to determine whether a temporal artery biopsy 
result was positive. 

MAIN OUTCOME MEASURES 

The main outcome measures were sensitivity and specificity. 

MAIN RESULTS 

Seventeen patients had temporal arteritis and 33 patients had 
a normal biopsy result. The mean age was 73 years (range, 
60-82 years) for patients with a positive biopsy result and 67 
years (range, 49-85 years) for those with a negative biopsy 
result. The mean durations of steroid therapy for patients 
with positive and negative biopsy results were 7 and 10 days, 
respectively. The mean erythrocyte sedimentation rate (ESR) 
was 56 mm/h for patients with a positive biopsy result and 38 
mm/h for those with a negative biopsy result. Seventeen 
patients (34%) had a positive temporal artery biopsy result 
(Table 49-13). 

Among clinical and laboratory features in a population of 
50 patients suspected of having temporal arteritis, an ESR 
less than 50 mm/h decreased the likelihood of temporal 
arteritis, whereas an ESR of 50 to 100 mm/h increased the 
likelihood of temporal arteritis. All other results had a 95% 
confidence interval that included 1. 


Table 49-13 Likelihood Ratios of Demographic Variables, Symptoms, 
Signs, and Laboratory Values for Temporal Arteritis (Disease Frequency 
17/50) 

Feature 
(No. With 


Feature) 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Jaw pain (6) 

0.24 

0.94 

3.9(0.79-19) 

0.81 (0.61-1.1) 

History of fever 
(4) 

0.12 

0.94 

1.9(0.29-12) 

0.94(0.77-1.1) 

Polymyalgia 
rheumatica (4) 

0.12 

0.94 

1.9(0.29-12) 

0.94(0.77-1.1) 

Male sex (15) 

0.35 

0.73 

1.5(0.65-3.4) 

0.83(0.52-1.3) 

Neurologic 
symptoms (21) 

0.71 

0.42 

1.2(0.79-1.8) 

0.69 (0.30-1.6) 

Steroid use 
before biopsy 
(31) 

0.71 

0.42 

1.2(0.79-1.8) 

0.69(0.30-1.6) 

Headache (44) 

0.88 

0.12 

1.0(0.81-1.2) 

0.97 (0.2-4.8) 

Temporal 
tenderness (36) 

0.65 

0.24 

0.9 (0.60-1.3) 

15(0.6-3.5) 

Visual symptoms 
(21) 

0.24 

0.48 

0.5(0.2-12) 

16(10-2.4) 

Ocular signs (8) 

0.06 

0.79 

0.3 (0.04-2.2) 

1.2(0.96-1.5) 

ESR 

>100 mm/h 
(2) 



1.9 (0.21-18) 


50-100 mm/h 
(21) 



2.1 (1.1-4.0) 


<50 mm/h 
(27) 



0.44(0.19-0.86) 



Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, 
positive likelihood ratio; LR-, negative likelihood ratio. 


STRENGTHS Standardized data set for all patients. 

LIMITATIONS The small sample size resulted in broad con¬ 
fidence intervals for the likelihood ratios. 

Commentary 

Only the ESR was a significant predictor of disease, but low 
statistical power limits the conclusions for other findings. 
The authors studied several factors that proved significant in 
our original review and meta-analysis but which failed to 
predict biopsy results. This study illustrates the value of 
meta-analytic techniques that allow estimates of the operat¬ 
ing characteristics of diagnostic tests based on larger samples 
than available in any individual study. 

Reviewed by Gerald W. Smetana, MD 


CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 (using criteria from the origi¬ 
nal review). 
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TITLE Initiation of Glucocorticoid Therapy: Before or 
After Temporal Artery Biopsy. 

AUTHORS Younge BR, Cook BE, Bartley GB, Hodge 
DO, Hunder GG. 

CITATION Mayo Clin Proc. 2004;79(4):483-491. 

QUESTIONS Do clinical features exist among patients 
with suspected giant-cell arteritis (GCA) that may help 
clinicians decide when to initiate glucocorticoid therapy? 
When is the likelihood of positive biopsy result so high 
that therapy should begin right away? When is the likeli¬ 
hood low enough to defer treatment until after biopsy 
results are available? 

DESIGN Retrospective, case-control series (cases had 
positive biopsy results; controls had negative biopsy 
results). 

SETTING Mayo Clinic, Rochester, Minnesota. 

PATIENTS One thousand one hundred thirteen 
sequential patients, identified through the Mayo Surgical 
Index, undergoing temporal artery biopsy between Janu¬ 
ary 1988 and December 1997. Twenty percent of the 
patients were receiving oral corticosteroids at the biopsy. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Diagnostic (gold) standard was temporal artery biopsy; the 
authors collected multiple clinical features (by medical his¬ 
tory, physical examination, and laboratory studies). Standard 
Mayo Clinic reference ranges for laboratory values were used, 
including erythrocyte sedimentation rate (ESR) of 0 to 22 
mm/h for men and 0 to 29 mm/h for women. 


MAIN OUTCOME MEASURES 

Sensitivity, specificity, and predictive values of various clini¬ 
cal and laboratory findings with respect to biopsy results 
were calculated. 


MAIN RESULTS 

• Three hundred seventy-three patients had positive biopsy 
results (33.5%); 740 (66.5%) had negative biopsy results. 

• The commonly taught combination of headache with ESR 
had a likelihood ratio (LR) of 2.4 (95% confidence interval 
[Cl], 2.1-2.7) when the ESR was elevated. When neither a 
headache nor ESR abnormality was present, the LR for 
temporal arteritis was 0.42 (95% Cl, 0.36-0.49). 

Clinical findings (LRs and CIs are calculated from data 
provided in the article) are shown in ble 49-14. 


Laboratory findings in patients not receiving oral cortico¬ 
steroid treatment (LRs and CIs are calculated from data pro¬ 
vided in the article) are shown in j 49-15. 

A decision rule was developed from a multivariate model: 

Temporal arteritis score = -240 + 48 x (headache) + 108 x 
(jaw claudication) + 56 x (scalp tenderness) + 1.0 x (ESR) + 
70 x (ischemic optic neuropathy) + 1.0 x (age) 

(If symptom present, substitute 1; if negative, 0) 

Estimated probability = [exp (score/50) ]/[l + exp (score/50) ] 

If score < -110, low risk (<10% chance of positive biopsy). 

If score = -110 to 70, intermediate risk (10%-80% chance of 
positive biopsy result). 

If score > 70, high risk (>80% chance of positive biopsy result). 

The model was validated with prospective data on 289 
patients; 86% of the high-risk patients had a positive biopsy 
result, whereas 9% of the low-risk patients had a positive 
biopsy result. 


Table 49-14 Likelihood Ratios for Single Symptoms and in 

Combination for Temporal Arteritis 

Test/Feature 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Single Features 

Jaw claudication 

0.40 

0.94 

6.9 (5.0-9.5) 

0.64 (0.59-0.7) 

Diplopia 

0.04 

0.99 

3.7(1.5-9.2) 

0.97 (0.95-0.99) 

Scalp tenderness 

0.33 

0.89 

3.1 (2.4-4.0) 

0.75(0.70-0.81) 

Myalgia/arthralgia 

0.46 

0.50 

2.2(1.6-3.1) 

0.90 (0.86-0.95) 

New headache 

0.67 

0.60 

1.7 (1.5-1.9) 

0.54 (0.46-0.63) 

Decreased vision 

0.13 

0.92 

1.5(1.0-2.1) 

0.95(0.91-0.99) 

Weight loss 

0.24 

0.81 

1.3 (1.0-1.6) 

0.93 (0.87-0.99) 

Combination of Findings 

Jaw claudication 
and decreased 
vision 

0.06 

1.0 

44 (5.9-322) 

0.98 (0.97-0.99) 

Jaw claudication 
and diplopia 

0.02 

10 

30 (1.7-519) 

0.98 (0.97-0.99) 

New headache, jaw 
claudication, and 
scalp tenderness 

0.15 

0.99 

19(8.1-42) 

0.86 (0.82-0.90) 

Jaw claudication 
and scalp 
tenderness 

0.17 

0.99 

18(8.3-39) 

0.84 (0.80-0.88) 

New headache and 
jaw claudication 

0.32 

0.96 

8.7(5.8-13) 

0.71 (0.66-0.76) 

New headache and 
decreased vision 

0.06 

0.99 

6.2 (2.7-14) 

0.95 (0.93-0.98) 

New headache and 
scalp tenderness 

0.29 

0.93 

3.9 (2.9-5.3) 

0.77 (0.72-0.82) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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Table 49-15 Likelihood Ratios of Laboratory Findings for Temporal 
Arteritis 

Finding 

Sensitivity 

Specificity 

LR+ (95% Cl) 

LR- (95% Cl) 

Abnormal plate¬ 
let count 

0.37 

0.77 

1.6 (1.3-1.9) 

0.82 (0.75-0.89) 

Abnormal ESR 

1.0 

0.16 

1.2 (1.1-1.2) 

0.02(0-0.14) 

Abnormal hemo¬ 
globin level 

0.80 

0.32 

1.2 (1.1-1.3) 

0.63 (0.50-0.79) 


Abbreviations: Cl, confidence interval; ESR, erythrocyte sedimentation rate; LR+, 
positive likelihood ratio; LR-, negative likelihood ratio. 


A score derived from clinical features and laboratory test¬ 
ing among patients suspected of having GCA can stratify 
patients into low, intermediate, and high likelihood of a tem¬ 
poral artery biopsy. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 1 (using criteria from the origi¬ 
nal review). 

STRENGTHS The study had a large sample size, standard¬ 
ized data abstraction for all patients, and a temporal biopsy 
in all patients. 

LIMITATIONS Retrospective review. 

Commentary 

This was a high-quality study, although it was retrospective. 
The results suggest that several readily available clinical fea¬ 


tures can be combined to establish low, intermediate, and 
high levels of risk for positive biopsy. Strengths of this study 
were that the authors separately reported data for patients 
receiving corticosteroids before biopsy, combined clinical 
features (as a clinician does in actual practice), and prospec¬ 
tively tested the model derived from the retrospective analy¬ 
sis. An important limitation was the retrospective design. 

For identifying patients with temporal arteritis, the data 
suggest that the findings of headache, jaw claudication, and 
scalp tenderness have some degree of independence. The 
independence can be inferred by noticing that multiplying 
the LR for the presence of each of the findings approximates 
the LRs when they are assessed in combination. The authors 
have performed a service for clinical readers by evaluating 
these variables in a clinical model, confirming that they have 
independent significance (though jaw claudication is the 
most important when present), and validating their results 
by assessing the model prospectively. 

Although a normal ESR appeared to rule out disease with a 
univariate LR of 0.02, the model should be examined for how 
that finding would work when there is a strong clinical suspi¬ 
cion. For example, a 72-year-old man who has a new head¬ 
ache, but no other signs or symptoms, and an ESR of 20 mm/h 
would have a score of-100 and should be at low to intermedi¬ 
ate risk (probability, 12%). As jaw claudication and scalp ten¬ 
derness symptoms are added, his risk increases to 78%, even 
with an ESR of only 20 mm/h. If other investigators validate 
these data in future research, then age plus clinical findings 
(headache, scalp tenderness, and jaw claudication in combina¬ 
tion) would exceed the importance of the ESR. 

Reviewed by Robert H. Shmerling, MD 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Patient Have an 

Acute Thoracic Aortic 
Dissection? 

Michael Klompas, MD 


CASE 1 A 64-year-old man with a history of hyperten¬ 
sion presents to the emergency department after sudden 
onset of severe, anterior chest pain. On examination, he is 
alert but uncomfortable. His blood pressure is normal 
and identical in both arms. His chest is clear, and careful 
cardiac auscultation fails to reveal a diastolic murmur. A 
chest radiograph reveals a small pleural effusion but is 
otherwise unremarkable. 

CASE 2 A 59-year-old woman is brought to the emer¬ 
gency department after the sudden onset of tearing chest 
pain. On examination, she is alert and oriented. Her 
blood pressure is identical in both arms. Results of her 
cardiac and pulmonary examinations are normal but she 
has a dense left-sided motor deficit. A portable chest 
radiograph raises the question of a widened mediastinum. 


WHY IS CLINICAL EXAMINATION IMPORTANT? 


A man ... was seized with a pain of the right arm and soon 
after of the left, ... after these there appeared a tumor on 
the upper part of the sternum. ...He was ordered to think 
seriously and piously of his departure from this mortal life, 
which was very near at hand and inevitable. 

—J. B. Morgagni, 1761 1 

There is no disease more conducive to clinical humility 
than aneurysm of the aorta. 

—Sir William Osier, c 1900 2 

Acute thoracic aortic dissection, one of the most common 
and serious diseases of the aorta, carries a high morbidity 
and mortality rate when it is not recognized and treated 
promptly. Autopsy series conducted before the era of modern 
treatment estimated that 40% to 50% of patients with dissec¬ 
tion of the proximal aorta died within 48 hours. 3 For those 
fortunate enough to survive the initial 48 hours, the disease 
was thought to carry a 90% 1-year mortality rate. 3 ' 4 Since the 
introduction of modern treatment regimens, the fatality rate 
has declined dramatically. Patients with proximal ascending 
dissections who rapidly undergo surgery in experienced ter¬ 
tiary centers have a 30-day survival rate of 80% to 85% and a 
10-year survival of 55%. 4 ' 5 Likewise, patients with dissection 
of the descending aorta treated with aggressive antihyperten¬ 
sive therapy have a 30-day survival rate greater than 90% and 
a 10-year survival rate of 56%. 4 ' 6 Realization of the dramatic 
benefits of medical intervention depends on rapid establish¬ 
ment of the diagnosis of dissection. 

Approximately 4.6 million patients per year present with 
chest pain to emergency departments in the United States 
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(8.2% of all emergency department visits). 7 Although 
advanced imaging techniques can reliably establish the diag¬ 
nosis of thoracic aortic dissection in high-risk populations, it 
is obviously inefficient, uneconomic, and unrealistic to 
image every patient complaining of chest pain. Indiscrimi¬ 
nate use of diagnostic imaging in poorly chosen patients with 
low pretest probability of having dissection has been pre¬ 
dicted to yield up to an 85% rate of false-positive results, 
depending on the imaging modality chosen. 8 On the other 
hand, misdiagnosis of acute thoracic aortic dissection as 
unstable angina or myocardial infarction can have disastrous 
iatrogenic consequences should the patient receive anticoag¬ 
ulants or thrombolytic therapy. 9 Physicians are therefore 
acutely dependent on the clinical history, examination, and 
chest radiograph to determine which patients require further 
study. 

Traditionally, clinical diagnosis of thoracic aortic dissec¬ 
tion has been inaccurate. Physicians correctly suspect the 
diagnosis in as few as 15% to 43% of presentations when ini¬ 
tially evaluating patients with dissection. 3 ' 10 ' 11 Diagnostic 
delay of more than 24 hours after hospitalization occurs in 
up to 39% of cases. 12 When the diagnosis is made, not infre¬ 
quently it is an incidental discovery made during an 
advanced imaging procedure intended to assess for other 
diagnoses. 13,14 Autopsies reveal the correct diagnosis is still 
missed in more than 10% of patients. 13 

The purpose of this review is to offer physicians an evidence- 
based foundation for using the clinical history, physical exami¬ 
nation, and chest radiograph to assess the likelihood of thoracic 
aortic dissection. 

Pathophysiology of Thoracic Aortic Dissection 

The aortic wall is composed of 3 contiguous tissue layers in 
sequence from the vessel lumen proceeding outwards: the 
intima, media, and adventitia. Weakening of these tissue 
layers can lead to a tear in the intima, permitting the entry 
of blood between the intima and adventitia. 15 Passage of 
blood into this space can extend the tear and create a 
so-called false lumen. The majority of these tears take place 
in the ascending aorta, usually in the right lateral wall 
where the greatest shear force on the artery wall is produced 
by blood expulsed from the heart under high pressure. 3 The 
tear then extends along the greater curve of the aortic arch 
and down the descending aorta, though retrograde exten¬ 
sion of the tear toward the aortic valve is also possible. 15 
Most aortic tears occurring beyond the ascending aorta 
originate immediately distal to the left subclavian artery. 15 
Predisposing factors for the initiation of a thoracic aortic 
dissection include hypertension, 15 bicuspid aortic valve, 15 
coarctation of the aorta, 15 the Marfan syndrome, 16 Ehlers- 
Danlos syndrome, 17 Turner syndrome, 18 giant cell arteritis, 19 
third-trimester pregnancy, 20 cocaine abuse, 21 trauma, 22 intra¬ 
aortic catheterization, 23 and history of cardiac surgery, par¬ 
ticularly aortic valve replacement. 24 

The clinical features of thoracic aortic dissection are a con¬ 
sequence of the underlying pathophysiologic changes in the 
aorta. Patients perceive the initial aortic tear as sudden onset 


of severe ripping or tearing chest pain. The pain is sometimes 
described as having a migrating quality, likely corresponding 
to extension of the tear along the aorta. Depending on the 
location of the tear and its direction of extension, patients 
alternately describe the pain as radiating to the neck, back, or 
abdomen. Occasional presentations of painless dissection 
have been reported, though these are usually accompanied by 
other findings. 25,26 

Retrograde extension of the tear to the aortic valve can 
result in aortic regurgitation, with its characteristic diastolic 
murmur. Likewise, if the tear communicates with the peri¬ 
cardial space, patients can present with symptoms of acute 
pericardial tamponade (hypotension, pulsus paradoxus, jug¬ 
ular venous distention, and muffled heart sounds). Syncope 
or prolonged unconsciousness can be the initial presentation 
of patients with pericardial tamponade. 

The initial aortic tear and subsequent extension of a false 
lumen along the aorta can occlude blood flow from the true 
lumen of the aorta into any of the arteries that originate from 
the aorta. Depending on which arteries become occluded, 
patients can present with a variety of corresponding syn¬ 
dromes. These include acute myocardial infarction from 
occlusion or extension of tear into the coronary arteries (typ¬ 
ically the right coronary artery); death, syncope, or hemiple¬ 
gia after occlusion of one or both carotid arteries; absent 
peripheral pulses in the major limb vessels secondary to 
occlusion of the brachiocephalic trunk, left subclavian artery, 
or distal aorta; anuria from disruption of renal blood flow; 
and paraplegia or quadriplegia from occlusion of vessels 
feeding the anterior spinal artery. 

Examination for the Signs and Symptoms 
of Thoracic Aortic Dissection 

The classic clinical history for thoracic aortic dissection con¬ 
sists of the sudden onset of severe tearing or ripping chest 
pain radiating to the interscapular region or low back, occur¬ 
ring in late-middle-aged men with a history of hypertension. 
Physicians therefore need to inquire of patients about the 
onset, quality, radiation, and intensity of patients’ pain. 
Inquiry should also be made of history or symptoms sugges¬ 
tive of factors that increase the risk of aortic dissection, 
including hypertension, Marfan syndrome, bicuspid aortic 
valve, previous aortic valve replacement, and the other syn¬ 
dromes previously listed. 

History-taking from patients with thoracic aortic dissec¬ 
tion has tended to be poor; however, there is evidence that a 
more thorough medical history may increase diagnostic 
yield. A retrospective chart review of 83 patients with subse¬ 
quently confirmed thoracic aortic dissection revealed that 
only 42% of conscious patients were asked all of 3 basic ques¬ 
tions about their pain (quality, radiation, intensity at 
onset). 14 One-quarter of patients were asked 1 or none of 
these key questions. If all 3 questions were asked, physicians 
correctly diagnosed thoracic aortic dissection in 30 of 33 
patients (91%); if 1 or more of these questions was omitted, 
then the correct diagnosis was suspected during the initial 
evaluation in only 22 of 45 (49%) patients (P < .001). In 
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these patients, the diagnosis was made later, usually as an 
incidental finding during imaging procedures intended to 
diagnose alternative conditions. Unfortunately, the retro¬ 
spective design of this study cannot preclude the possibility 
that physicians were simply more likely to ask about addi¬ 
tional classic findings when they already had a strong clinical 
suspicion of thoracic aortic dissection derived from other 
data, including physical examination and chest radiograph. 

The physical examination should begin with elicitation of 
vital signs, particularly the blood pressure and pulses on both 
sides of the body. While checking the blood pressure, the 
examiner should evaluate for acute pericardial tamponade by 
assessing for pulsus paradoxus, particularly in a patient with 
hypotension or jugular venous distention. Frequent allusion 
is made to the importance of comparing the blood pressure 
in both arms. Although it is essential to seek evidence of vas¬ 
cular occlusion in the arms, the complete examination 
should include comparison of all major arteries, including 
the carotid and femoral pulses, in addition to the radial 
pulses. 

Most of the published series of patients with thoracic aor¬ 
tic dissection comment only on the loss or obvious diminish- 
ment of pulses rather than particular blood pressure 
differentials. Older retrospective autopsy series that do 
refer to blood pressure differentials arbitrarily designate a 
difference in systolic pressure between arms of 20 mm Hg 
or 30 mm Hg as significant. 3 - 27 However, a convenience sam¬ 
ple of 610 patients without thoracic aortic dissection pre¬ 
senting to an emergency department showed that 53% had 
interarm differences of greater than 10 mm Hg and 19% had 
differences greater than 20 mm Hg. 28 Nonetheless, a good- 
quality, prospective, observational study did find that a blood 
pressure differential of greater than 20 mm Hg was an inde¬ 
pendent predictor of dissection. 29 Hence, a blood pressure 
differential of at least 20 mm Hg ought to be present to be 
considered significant. 

Cardiac auscultation should focus on detecting the dia¬ 
stolic murmur of aortic regurgitation. 30 A rapid neurologic 
examination directed toward the detection of gross motor 
and sensory defects such as hemiplegia and paraplegia 
should ensue. 

Rarer clinical findings reported in the literature include 
pulsatile sternoclavicular joint, hoarseness, dysphagia, supe¬ 
rior vena cava syndrome, Horner syndrome, bulbar palsies, 
acute arterial occlusion, deep vein thrombosis, and bilateral 
testicular tenderness. 31 ' 37 

A chest radiograph should be obtained and examined for 
abnormalities of the aortic silhouette. This is best accom¬ 
plished with a standing anteroposterior projection. Unfortu¬ 
nately, the majority of chest radiograph findings associated 
with thoracic aortic dissection are subjective and not defined. 
Criteria for radiographic features associated with traumatic 
thoracic aortic dissection have been proposed but have not 
been adopted or validated in radiologic studies of nontrau- 
matic dissections. 38 Radiographic abnormalities may include 
wide mediastinum, widening of the aortic knob, difference 
in diameter between the ascending and descending aorta, and 
blurring of the aortic margin secondary to local extravasation 


of blood. 39 The chest radiograph might also reveal unilateral 
or bilateral pleural effusions. The calcium sign, consisting of 
the separation of intimal calcification from the outer border 
of the aortic knob by 1 cm or more, is highly suggestive of 
dissection but present in a minority of cases. 37,40 Comparison 
with previous chest radiographs of the same patient can help 
the examiner detect suggestive new changes in the aortic 
contour. 

METHODS 

Literature Search and Selection 

A structured MEDLINE search including 1966 through 2000 
was conducted to identify English-language articles examining 
the accuracy of the clinical history, examination, and chest 
radiograph in the detection of acute thoracic aortic dissection. 
Key words used in the search included “physical examination,” 
“medical history taking,” “professional competence,” “repro¬ 
ducibility of results,” “observer variation,” “diagnostic tests,” 
“decision support techniques,” “Bayes theorem,” “sensitivity,” 
“specificity,” “thoracic aortic dissection,” “aortic aneurysm,” 
and “dissecting aneurysm.” Articles focusing only on electro¬ 
cardiograms (ECGs) were not specifically sought because such 
analyses document a variety of abnormalities seen with tho¬ 
racic aneurysm but lack the appropriate clinical information 
for valid sensitivity and specificity estimates. When studies 
reported the results of ECGs as part of the overall clinical 
examination, however, these data were collated. Abstracts were 
reviewed and the full texts of articles that might meet the 
inclusion criteria were retrieved. The reference lists of reviewed 
articles were searched to identify additional sources. 

All potential articles were reviewed for explicit inclusion and 
exclusion criteria. Articles were included if they were original 
studies describing the clinical findings in a series of 18 or more 
consecutive patients with confirmed dissection of the thoracic 
aorta (Table 50-1). Acceptable means of confirmation of diag¬ 
nosis were surgical exploration, autopsy, aortogram, magnetic 
resonance imaging, computed tomography, or transesoph¬ 
ageal echocardiography. The latter 4 imaging studies were 
included as acceptable gold-standard investigations according 
to high sensitivity and specificity. 41,42 Articles were excluded if 
more than 15% of their cohorts included trauma patients, 
patients with chronic thoracic aortic dissection (defined as a 
dissection presumed to have occurred more than 14 days 
before presentation), or patients with abdominal aortic aneu¬ 
rysms or if the study selectively included patients with only 
proximal or distal dissections. 

Retrieved studies were graded for quality using criteria 
similar to that used in previous articles in this series but 
modified to include only consecutive series. Level 1 studies 
were defined as prospective, blinded examinations of a large 
number (>100) of independently selected consecutive patients. 
Level 2 studies were of identical criteria but included fewer 
than 100 patients. Level 3 studies were large, prospective inves¬ 
tigations but included nonindependently selected patients. 
Level 4 studies were retrospective reviews of nonindepen¬ 
dently selected patients (see Table 1-7). 
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Table 50-1 Studies Assessing the Accuracy of Clinical Examination for Thoracic Aortic Dissection 


Source, y 

Clinical Setting, 
Study Dates 

Design 

No. of Patient 
Episodes 

Age, y, 
Mean 
(Range) 

Male, % 

Type A, 

%“ 

Level of 
Quality" 

Armstrong et al, 43 
1998 

University hospital, 
1992-1994 

Retrospective review of patients with clini¬ 
cally suspected TAD referred for TEE 

75 

(34 With TAD) 

57 (20-80) 

74 

91 

4 

Chan, 44 

1991 

University hospital, 
1987-1989 

Prospective evaluation of utility of trans¬ 
esophageal echocardiography in patients 
with clinically suspected TAD 

40 

(18 With TAD) 

60 

60 

C 

4 

Enia et al, 45 

1989 

Hospital, 1981-1987 

Prospective evaluation of transthoracic 
echocardiography in patients with clini¬ 
cally suspected TAD 

46 

(35 With TAD) 

58 (34-82) 

91 

66 

4 

Erb and Tullis, 46 
1960 

University hospital, 
1950-1960 

Retrospective chart review 

30 

56 (36-85) 

67 


4 

Hagan et al, 5 

2000 

12 Tertiary centers in 6 
countries, 1996-1998 

Multinational prospective international 
registry; cases identified on admission or 
review of discharge/surgery/radiology 
records; 60% of cases referred 

464 

63 

65 

62 

4 

Hume and Porter, 47 
1963 

University hospital and 
medical examiner's office, 
1950-1962 

Retrospective chart review" 

68 

53 (10-79) 

79 

81 

4 

Itzchak et al, 48 

1975 

Hospital, 1960-1973 

Retrospective chart review 

24 

57 (12-86) 

75 

46 

4 

Jagannath et al, 40 
1986 

University hospital, 
1965-1977 

Retrospective review of radiographs 6 

72 

(36 With TAD) 

62 (17-85) 

Not 

stated 

1/3 

4 

Levinson et al, 27 
1950 

University hospital, 
1935-1947 

Retrospective chart review of autopsy 
cases 

58 

59 (22-90) 

72 


4 

Lindsay and 

Hurst, 49 1967 

University hospital, 
1949-1966 

Retrospective chart review 

62 

57(31-83) 

65 

65 

4 

Luker et al, 50 

1994 

Hospital, 1987-1993 

Retrospective review of radiologists’ initial 
chest radiograph readings in cases with 
subsequently confirmed TAD 

75 

61 (24-77) 

49 

47 

4 

Meszaros et al, 10 
2000 

3 Hungarian towns, 
1972-1998 

Longitudinal, observational, popula¬ 
tion-based study' 

86 

66 (36-97) 

61 

86 

4 

Miller et al, 51 

1979 

University hospital, 
1963-1979 

Retrospective review of surgically man¬ 
aged cases 

73 

57 (20-86) 

70 

73 

4 

Nielsen, 52 

1961 

3 Danish hospitals, 
1944-1958 

Retrospective chart review 8 

40 

66 (36-83) 

45 


4 

Pate et al, 53 

1976 

Memphis, TN, hospitals, 
dates not given 

Retrospective chart review 

126 

Not reported 

79 


4 

Pinet et al, 54 

1984 

University hospital, 
1970-1979 

Retrospective chart review 

191 

58 (19-90) 

69 

64 

4 

Slater and 
DeSanctis, 37 1976 

University hospital, 
1963-1973 

Retrospective chart review 

124 

59 (19-81) 

73 

43 

4 

Strong et al, 55 

1974 

University hospital and VA 
hospital, 1960-1973 

Retrospective chart review 

59 

60 (26-86) 

78 

46 

4 

Sullivan et al, 11 

2000 

3 University hospital EDs, 
1992-1996 

Retrospective review of ED patients 
referred for thoracic imaging 

44 

65 (36-89) 


61 

4 

Viljanen, 12 

1986 

University hospital, 
1964-1985 

Retrospective review of surgically man¬ 
aged cases 

73 

51 

66 

64 

4 

Von Kodolitsch 
et al, 29 2000 

University hospital, 
1988-1996 

Prospective study of patients presenting to 
ED with history suggestive of TAD 

250 

(128 With TAD) 

53 

78 

61 

3 


Abbreviations: ED, emergency department; TAD, thoracic aortic dissection; TEE, transesophageal echocardiography; VA, Veterans Affairs. 
“Type A refers to aortic dissections involving the aorta proximal to the subclavian artery. 
b See Table 1 -7. 

“Ellipses indicate information not available. 

“Two cases not confirmed by surgery or autopsy. 

“Does not include data on the frequency of specific radiographic findings but does report interobserver agreement. 

'Eleven percent of cases were chronic. 

“Forty cases in which TAD was considered cause of death; also reports additional 18 cases in which TAD was incidental finding on autopsy. 
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Study Characteristics 

A total of 274 studies were identified by the search strategy, of 
which 21 studies met inclusion criteria (Table 50-1). No level 
1 or level 2 studies were located. One study met level 3 criteria; 
the remaining 20 were level 4. One large series was self- 
described as prospective in conception and definition of clin¬ 
ical parameters. 5 An unknown percentage of its patients, 
however, were identified by physician review of discharge 
records, echocardiography, and surgical databases. This study 
was consequently classified conservatively as level 4. 5 Approxi¬ 
mately half the investigations, including the 1 level 3 study, 
were specifically designed to elucidate the clinical presentation 
of acute aortic dissection. The remaining reports were either 
designed to test new imaging modalities or to study the out¬ 
comes of medical or surgical management of patients with 
thoracic aortic dissection. In each case, however, these studies 
included data on patients' clinical findings at diagnosis. The 
studies varied considerably in the number and detail of com¬ 
ponents of the clinical history or examination that were 
reported. Only the prospective level 3 study explicitly defined 
the criteria used to establish whether a given clinical finding 
was present or absent. 29 

These studies assessed a total of 1848 patients aged 10 to 97 
years. The major limitation of all the studies is that patients 
were selected for inclusion either retrospectively after confir¬ 
mation of diagnosis by a reference standard study or pro¬ 
spectively according to the presenting clinical picture. 
Therefore, in all these studies the reference standard and 
clinical examination were not applied independently of one 
another. This biases the results of the studies to overestimate 
the sensitivity of clinical findings because more obvious cases 
are preferentially included in such series. In addition, physi¬ 
cians performing the reference standard procedure were not 
blinded to the results of the clinical examination and vice 
versa. This too could lead to overestimation of sensitivity. 

Only 4 studies included control groups. 29,43 ' 45 Although 
these investigations can be used to generate data for specific¬ 
ity in addition to sensitivity, their estimations of specificity 
are heavily influenced by their inclusion biases. The specifici¬ 
ties derived from these studies should be interpreted with 
caution because they reflect only the specificity for a given 
sign or symptom among patients similar to those included in 
the studies (ie, those with a full clinical syndrome suggestive 
of thoracic aortic dissection). These studies likely overesti¬ 
mate sensitivity and underestimate specificity by selecting 
patients for inclusion because of the presence of the particu¬ 
lar sign being considered, thereby creating cohorts with arti¬ 
ficially high prevalence of the finding. 

Data Analysis 

Summary measures for the sensitivity for components of the 
clinical examination for acute thoracic aortic dissection used 
published raw data from the reported trials that met criteria. 
Only 4 studies included specificity data that allowed construc¬ 
tion of likelihood ratios (LRs). A random-effects model was 
used to generate conservative summary measures and confi¬ 
dence intervals (CIs) for the sensitivity and LRs. 56 For LRs, a 


summary measure is reported only when there are more than 
2 studies. The uncertainty in these measures is reflected in the 
broad CIs around the estimates. Interobserver agreement was 
calculated and interpreted using the K statistic of Landis and 
Koch. 57 Fast Pro version 1.8 software was used for the meta¬ 
analysis (Academic Press, San Diego, California). 

RESULTS 

Accuracy of the Clinical History 

Risk Factors 

Sixteen studies examining 1553 patients report sensitivities 
for various components of the clinical history in Table 50-2. 
Most patients with dissection have a documented history of 
hypertension (sensitivity, 64%); however, the LR+ of this his¬ 
tory is 1.6 (95% Cl, 1.2-2.0). The pooled prevalence of the 
Marfan syndrome in this group of studies was 5% (95% Cl, 
4%-7%). Given that the Marfan syndrome afflicts only 
0.02% to 0.03% of the general population, 58 the high preva¬ 
lence of the Marfan syndrome in these series is suggestive of a 
markedly increased risk associated with this disorder, though 
the frequency of the Marfan syndrome detected in these 
series likely reflects the inclusion biases of these studies. The 
one controlled study that assessed for the Marfan syndrome 
generated an LR+ of 4.1, 29 

Symptoms 

The majority of patients presented with pain (pooled sensitivity, 
90%) of severe intensity (sensitivity, 90%) that occurred sud¬ 
denly (sensitivity, 84%). All other recorded clinical symptoms 
were present in a low to moderate proportion of patients (Table 
50-2). Patients were most likely to have anterior chest pain (sen¬ 
sitivity, 57%); however, pain was frequently experienced else¬ 
where, including the posterior chest (32%), back (32%), and 
abdomen (23%). Likewise, migrating and ripping or tearing 
pain was present in only 31% and 39% of patients, respectively. 

The presence of pain of sudden onset is not diagnostic 
(LR+, 1.6; 95% Cl, 1.0-2.4). The absence of this history, how¬ 
ever, substantively decreases the probability of an acute tho¬ 
racic aortic dissection (LR-, 0.3; 95% Cl, 0.2-0.5). Physicians 
should be cautious about relying too heavily on the absence 
of sudden pain to exclude aortic dissection because the inclu¬ 
sion biases of these studies likely overestimate the sensitivity. 

Pain of a tearing or ripping sensation may also be diagnosti¬ 
cally useful. Two studies found almost identical specificities of 
94% and 95% for this historical feature. 29,43 Although the 
reported specificities were almost identical, the LR+s generated 
by these 2 studies differed considerably (1.2 vs 11; Table 50-3) 
reflecting significant heterogeneity in the sensitivity for this his¬ 
tory reported by the 2 investigations. The retrospective study 
found that only 7% of patients had noted tearing or ripping 
pain. 43 By contrast, the better-quality, larger, prospective study, 
in which physicians were asked to query predefined clinical 
symptoms of each patient, reported a sensitivity of 62%. 29 This 
figure is more consistent with the other large study with pro¬ 
spectively defined clinical symptoms in this series 5 and with the 
pooled sensitivity for this symptom (Table 50-2). Therefore, it 
seems reasonable to suspect that the higher reported sensitivity 
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Table 50-2 Sensitivity of the Clinical History in the Diagnosis of Acute Thoracic Aortic Dissection 

Sensitivity, % 






Source, y 

No. of 
Patients 

History of 
Hyperten¬ 
sion 

Marfan 

Syndrome Any Pain 

Chest 

Pain 

Anterior 

Chest 

Pain 

Posterior Back 
Chest Pain Pain 

Abdominal 

Pain 

Sudden- 

Onset 

Pain 

Severe 

Pain 

Ripping 

or 

Tearing 

Pain 

Migrating 

Pain 

Syncope 

Armstrong 
et al, 43 

1998 

34 

a 


94 

74 


56 

27 

88 

93 

7 


6 

Chan, 44 

1991 

18 

56 


78 





78 



39 


Eniaetal, 45 

1989 

35 

80 












Erb and 
Tullis, 46 1960 

30 

53 

7 

70 

40 



17 






Hagan et al, 5 
2000 

464 

72 

5 

96 

73 

61 

36 53 

30 

85 

91 

51 

17 

9 

Hume and 
Porter, 47 

1963 

68 

89 

4 

97 

59 

59 

33 43 

49 






Levinson et 
al, 27 1950 

58 

59 


78 

47 


9 36 

40 





14 

Lindsay and 
Hurst, 49 

1967 

62 



90 


61 

14 13 

11 






Meszaros 
et al, 10 

2000 

72 

67 


92 


64 

10 

10 





14 

Nielsen, 52 

1961 

40 

18 

3 

65 


54 

8 

33 

76 




16 

Pate et al, 53 
1976 

126 



88 

63 


38 22 


88 




10 

Pinet et al, 54 
1984 

191 

53 

7 

96 

63 


30 



89 


6 


Slater and 

DeSanctis, 37 

1976 

124 

65 

5 

94 

91 

43 

38 76 

4 

93 

94 


71 

5 

Strong et 
al, 65 1974 

59 

75 

3 



32 

25 

27 






Sullivan et 
al, 11 2000 

44 

70 

0 

98 

66 



34 





2 

Von 

Kodolitsch 
et al, 29 

2000 

128 

77 

7 

100» 


76 

50“ 

22 

79 

86 

62 

44 

10 

Summary 

NA 

64 

5 

90 

67 

57 

32 32 

23 

84 

90 

39 

31 

9 

sensitivity, 

% (95% Cl) 


(54-72) 

(4-7) 

(85-94) 

(56-77) 

(48-66) 

(24-40) (19-47) 

(16-31) 

(80-89) 

(88-92) 

(14-69) 

(12-55) 

(8-12) 


Abbreviations: Cl, confidence interval; NA, not applicable. 
“Ellipses indicate data not available. 

“Presence of pain inclusion criterion for study. 

“Posterior chest or lower back pain. 
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Table 50-3 Accuracy of Clinical Findings for Thoracic Aortic Dissection in Consecutive Patients Preselected for High Clinical Suspicion of 

Dissection Referred for Advanced Imaging 

Symptom or Sign 

Source, y 

LR+ (95% Cl) 

LR- (95% Cl) 

History of hypertension 

Chan, 44 1991 a 

1.5 (0.8-3.0) 

0.7 (0.4-1.3) 


Enia et al, 45 1989 b 

1.1 (0.7-1.6) 

0.7 (0.4-2.4) 


Von Kodolitsch et al, 29 2000 c 

1.8 (1.4-2.3) 

0.4 (0.3-0.6) 


Summary 

1.6(1.2-2.0) 

0.5 (0.3-0.7) 

Sudden chest pain 

Chan, 44 1991 3 

1.0 (0.7-1.4) 

0.98 (0.3-3.1) 


Armstrong et al, 43 1998 d 

1.5 (1.1-1.9) 

0.3 (0.1-0.8) 


Von Kodolitsch et al, 29 2000 c 

2.6 (2.0-3.5) 

0.3 (0.2-0.4) 


Summary 

1.6(1.0-2.4) 

0.3 (0.2-0.5) 

“Tearing” or “ripping” pain 

Armstrong et al, 43 1998 d 

1.2 (0.2-8.1) 

0.99 (0.9-1.1) 


Von Kodolitsch et al, 29 2000 c 

11 (5.2-22) 

0.4 (0.3-0.5) 

Migrating pain 

O 

zzr 

ZD 

CD 

CD 

1.1 (0.5-2.4) 

0.97 (0.6-1.6) 


Von Kodolitsch et al, 29 2000 c 

7.6(3.6-16) 

0.6 (0.5-0.7) 

Pulse deficit 

Armstrong et al, 43 1998 d 

2.4(0.5-12) 

0.93 (0.8-1.1) 


Enia et al, 45 1989 b 

2.7 (0.7-9.8) 

0.63 (0.4-1.0) 


Von Kodolitsch et al, 29 2000 c 

47 (6.6-333) 

0.62 (0.5-0.7) 


Summary 

5.7 (1.4-23) 

0.7 (0.6-0.9) 

Focal neurologic deficit 

Armstrong et al 43 1998 d 

6.6(1.6-28) 

0.71 (0.6-0.9) 


Von Kodolitsch et al, 29 2000 c 

33 (2.0-549) 

0.87 (0.8-0.9) 

Diastolic murmur 

Chan, 44 1991 a 

4.9 (0.6-40) 

0.8 (0.6-1.1) 


Armstrong et al 43 1998 d 

1.2 (0.4-3.8) 

0.97 (0.8-1.2) 


Enia et al, 45 1989 b 

0.9 (0.5-1.7) 

1.1 (0.6-1.7) 


Von Kodolitsch et al, 29 2000 c 

1.7 (1.1-2.5) 

0.79 (0.6-0.9) 


Summary 

1.4 (1.0-2.0) 

0.9 (0.8-1.0) 

Enlarged aorta or wide mediastinum 

Chan, 44 1991 a 

1.6 (1.1-2.3) 

0.13(0.02-1.0) 


Armstrong et al, 43 1998 d 

1.6 (1.1-2.2) 

0.42 (0.2-0.9) 


Von Kodolitsch et al, 29 2000 c 

3.4 (2.4-4.8) 

0.31 (0.2-0.4) 


Summary 

2.0(1.4-3.1) 

0.3 (0.2-0.4) 

Left ventricular hypertrophy on 

Chan, 44 1991 a 

0.2(0.03-1.9) 

1.2 (0.9-1.6) 

admission electrocardiogram 

Von Kodolitsch et al, 29 2000 c 

3.2(1.5-6.8) 

0.84 (0.7-0.9) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 
a A total of 18 (n = 40) patients with thoracic aortic dissection. 
b A total of 35 (n = 46) patients with thoracic aortic dissection. 

C A total of 128 (n = 250) patients with thoracic aortic dissection. 
d A total of 34 (n = 75) patients with thoracic aortic dissection. 


and LR are the more accurate data. Migratory pain has perfor¬ 
mance characteristics that are similar to tearing or ripping pain. 
The LR+ for the presence of this quality was 7.6 (95% Cl, 3.6- 
16) in one study 29 but only 1.1 (95% Cl, 0.5-2.4) in the other. 44 
Additional studies of independendy selected patients that pro¬ 
spectively ask about the sensation of tearing or ripping and 
migration of pain are needed to confirm the high LR for these 
findings. Description of pain as sharp was slightly more preva¬ 
lent than tearing or ripping; however, this descriptor was elicited 
in only 2 studies and had an LR+ near unity. 5,43 

Accuracy of the Physical Examination 

Physical examination findings classically associated with tho¬ 
racic aortic dissection are typically present in less than half of 


all cases (Table 50-4). However, when present, signs of tho¬ 
racic aortic dissection can be helpful. Among the most useful 
is a pulse differential between carotid, radial, or femoral 
arteries. Although the pooled sensitivity for this sign is only 
31%, a deficit in 1 of these pulses compared with the contra¬ 
lateral side is strongly suggestive of dissection (LR+, 5.7; 95% 
Cl, 1.4-23). 29,43,45 Focal neurologic deficits, though present in 
only 17% of cases, may also be helpful. Specificity for this 
sign is high in the 2 studies in which it has been measured 
(LR+, 6.6-33; Table 50-3). 29,43 The absence of a pulse deficit or 
focal neurologic deficit does not appreciably alter the likeli¬ 
hood of thoracic aortic dissection. 

The presence or absence of a diastolic murmur is not help¬ 
ful. Only one-third of patients with thoracic aortic dissection 
have a diastolic murmur (sensitivity, 28%). The LR+ and LR- 
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Table 50-4 Sensitivity of the Physical Examination in the Diagnosis of Acute Thoracic Aortic Dissection 

Sensitivity, % 


Source, y 

No. of 
Patients 

Elevated 

BP 

Diastolic 

Murmur 

Pulse 

Deficit 

Pericardial 

Rub 

Congestive 
Heart Failure 

Focal Neurologic 
Deficit 

Shock 

New Ml on 
ECG 

Armstrong etal, 43 1998 

34 

a 

15 

12 



32 

26 

11 

Chan, 44 1991 

18 


22 







Enia etal, 45 1989 

35 


49 

49 






Erb and Tullis, 46 1960 

30 


27 

72 

0 


13 


25 

Hagan et al, 5 2000 

464 

49 

32 

15 


7 

5 

16 

3 

Hume and Porter, 47 1963 

68 

68 

4 

34 




10 


Itzchak etal, 48 1975 

24 



21 



21 



Levinson et al, 27 1950 

58 

66 

28 

19 

5 


16 

22 

32 

Lindsay and Hurst, 49 1967 

62 

29 

35 

45 



23 

13 


Meszaros et al, 10 2000 

66 

44 

11 

20 

2 


41 

36 

9 

Miller etal, 51 1979 

73 

58 

64 



29 

12 



Nielsen, 52 1961 

40 







30 

10 

Pate etal, 53 1976 

126 

37 

21 

33 



13 

21 


Pinet et al, 54 1984 

191 


35 

55 

12 



38 


Slater and DeSanctis, 37 1976 

124 

36 

32 

31 



19 

10 

3 

Strong et al, 55 1974 

59 

66 

20 

34 




5 


Sullivan et al, 11 2000 

44 



12 



14 


2 

Viljanen, 12 1986 

73 


29 

37 



22 

30 


Von Kodolitsch et al, 29 2000 

128 

41 

40 

38 



13 

12 

2 

Summary sensitivity (95% Cl) 

NA 

49(41-57) 

28 (21-36) 

31 (24-39) 

6(3-13) 

15(4-33) 

17(12-23) 

19(15-26) 

7(4-14) 


Abbreviations: BP, blood pressure; Cl, confidence interval; ECG, electrocardiogram; Ml, myocardial infarction; NA, not applicable. 
“Ellipses indicate data not available. 


(LR+, 1.4; 95% Cl, 1.0-2.0; LR-, 0.9; 95% Cl, 0.8-1.0) are close 
to 1, suggesting that the presence or absence of a diastolic mur¬ 
mur should not be considered helpful. 29,43 ' 45 Unfortunately, 
these studies do not comment on whether the diastolic mur¬ 
murs identified were known to be new or old. It is possible that 
if a diastolic murmur was known to be new that it had greater 
diagnostic utility. 

Patients’ blood pressure on presentation is not helpful. 
Although approximately half of patients present with elevated 
blood pressure (pooled sensitivity, 49%; 95% Cl, 41%-57%), 
an equal proportion are either hypotensive or normotensive. 
Only 1 study permitted calculation of an LR for hypertension; 
however, this study confirmed its low diagnostic yield (LR+, 
1.3 for systolic blood pressure >150 mm Hg). 29 Pericardial rub 
is rarely present (pooled sensitivity, 6%; 95% Cl, 3%-13%). 
Assessment for pulsus paradoxus and jugular venous disten¬ 
tion is not enumerated in any of the studies. 

Electrocardiographic findings consistent with acute myocar¬ 
dial infarction do not rule out aortic dissection. New Q waves 
or ST-segment elevation were observed in 7% of admission 
ECGs (Table 50-4). Similarly, normal ECG results were docu¬ 
mented in 8% to 31% (mean, 22%) of patients. 5,10,11,37,46,52 The 
remaining ECGs had a variety of other abnormalities, includ¬ 
ing left ventricular hypertrophy, atrial fibrillation, and nonspe¬ 
cific ST-segment changes. As part of the clinical evaluation, 
ECGs have not been studied well but seem to have little utility 
for detecting or ruling out thoracic aortic dissection. 


Accuracy of the Chest Radiograph 

Pooling of 13 studies permitted analysis of 1337 radiographs. 
Only 3 studies commented on the proportion of portable vs 
conventional radiographs. The proportions of portable radio¬ 
graphs reported in these investigations were 24%, 61%, and 
80%. 29,43,50 Radiographic findings classically associated with tho¬ 
racic aortic dissection are not reliably present (Table 50-5). 
However, most patients with thoracic aortic dissection do tend 
to have abnormal findings on chest radiographs (sensitivity, 
90%) so that a completely normal radiograph result helps to 
decrease the likelihood of the diagnosis. In particular, absence of 
wide mediastinum and abnormal aortic contour decreases the 
probability of disease (LR-, 0.3; 95% Cl, 0.2-0.4; Table 50-5). 

Interobserver and intraobserver agreement for physician 
assessment of radiographs has been reported in 2 studies, both 
using radiologists as participants. Agreement was generally 
found to be fair (k = 0.25 for intraobserver agreement on suspi¬ 
cion for aortic dissection 50 ; K = 0.23-0.33 for interobserver 
agreement on presence of wide mediastinum, irregularities of 
the aortic contour, and pleural effusion 40 ). These low rates of 
interobserver agreement underscore the lack of validated stan¬ 
dards for defining the radiographic features of aortic dissection. 

Accuracy of Combinations of Findings 

Most clinical findings associated with thoracic aortic dis¬ 
section are insensitive when considered in isolation. Com- 
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Table 50-5 Sensitivity of the Chest Radiograph in the Diagnosis of Acute Thoracic Aortic Dissection 







Sensitivity, %“ 



Source, y 

No. of 
Patients 

Abnormal Aortic 
Contour 

Pleural 

Effusion 

Displaced Intimal 
Calcification 

Wide 

Mediastinum 

Abnormal Chest 
Radiograph Findings 

Armstrong et al, 43 1998 

34 




86 

100 

Chan, 44 1991 

18 




94 


Earnest et al, 39 1979 

74 

66 

27 

7 

11 

93 

Hagan et al, 5 2000 

427 

50 

19 

14 

62 

88 

Itzchak et al, 48 1 975 

24 

88 

17 

4 

83 


Luker et al, 48 1 994 

75 

76 


8 


85 

Pate et al, 53 1976 

87 


10 


70 

90 

Pinet et al, 54 1984 

191 




56 


Slater and DeSanctis, 37 1976 

116 

96 

9 

9 


96 

Strong et al, 55 1974 

59 

54 


2 

34 

95 

Sullivan et al, 11 2000 

31 

42 




84 

Viljanen, 12 1986 

73 




75 


Von Kodolitsch et al, 29 2000 

128 

76" 

13 




Summary sensitivity (95% Cl) 

NA 

71 (56-84) 

16(12-21) 

9(6-13) 

64 (44-80) 

90 (87-92) 


Abbreviations: Cl, confidence interval; NA, not applicable. 
“Ellipses indicate data not available. 

“Mediastinal or aortic widening. 


binations of findings, though not often found, markedly 
increase the accuracy of clinical assessment for thoracic 
aortic dissection. The single level 3 study described increas¬ 
ing accuracy of progressive combinations of findings (Table 
50-6). 29 For example, aortic pain alone (pain of sudden 
onset, tearing, or ripping in character or both) has an LR+ 
of 2.6; the presence of both aortic pain and pulse or blood 
pressure differentials increases the LR+ to 10 (95% Cl, 1.4- 
80). Further addition of mediastinal or aortic widening on 
chest radiograph clinches the diagnosis with an LR+ of 66 
(95% Cl, 4.1-1062). Unfortunately, this diagnostically valu¬ 
able triad was present in only 27% of patients. Conversely, 
patients without any findings from the triad (aortic pain, 
pulse of blood pressure differential, and mediastinal widen¬ 
ing) are unlikely to have a thoracic aortic dissection, given an 
LR- of 0.07 (95% Cl, 0.03-0.17). However, 4% of patients in 
this category, without any of the above signs, were nonethe¬ 
less ultimately diagnosed with aortic dissection. Given the 
high morbidity of a missed diagnosis, even such a pro¬ 
nounced LR- is insufficient to defer diagnostic imaging if 
thoracic aortic dissection is still clinically suspected. 

The improved accuracy of combinations of clinical find¬ 
ings may further be inferred from a holistic view of the 4 
studies that selected patients for inclusion on the basis of an 
overall clinical picture suggestive of thoracic aortic dissec¬ 
tion. Despite the relative rarity of thoracic aortic dissection 
compared with other acute causes of pain, approximately 
half the patients selected for these studies turned out to have 
thoracic aortic dissection (pooled sensitivity, 52%). By com¬ 
parison, only 0.003% of patients presenting to an emergency 
department with acute back, chest, or abdominal pain are 
eventually diagnosed with dissection. 29 This implies that a 
full clinical history, examination, and radiograph substan¬ 


tially select for patients with acute dissection. Furthermore, 
among patients referred for aortic imaging who turn out 
not to have an acute dissection, approximately half to three- 
quarters are diagnosed with alternative serious diseases that 
can potentially be identified by imaging intended to confirm 
the diagnosis of thoracic aortic dissection (Table 50-7). 29,33 - 43 ' 45,59 
The clinical syndrome suspicious for thoracic aortic dissec¬ 
tion, although far from pathognomonic for acute dissection, 
does detect patients with serious disease that merit advanced 
diagnostic imaging. 

THE BOTTOM LINE 

Despite the large number of case series describing patients 
with thoracic aortic dissection, the clinical examination for 
thoracic aortic dissection has yet to be prospectively scruti¬ 
nized in an independent, blinded study. The extant data 
permit estimation of the sensitivity of clinical history, phys- 


Table 50-6 Positive Likelihood Ratio of Aortic Dissection in Patients 
With Combinations of Findings 3 

No. of Findings 

LR+ (95% Cl) 

3 

66(4.1-1062) 

2 

5.3 (3.0-9.4) 

1 

0.5 (0.3-0.8) 

0 

0.1 (0.0-0.2) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio. 

“Data from Von Kodolitsch et al. 29 Findings include aortic pain (severe, sudden-onset 
tearing pain), blood pressure or pulse differential between arms, or wide mediasti¬ 
num on chest radiograph. 
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Table 50-7 Final Diagnoses in Patients With Clinical Syndromes Suggestive of Thoracic Aortic Dissection but Without Thoracic Aortic Dissection on 
Further Study 


Diagnosis 


No. (%) of Patients 3 


Von Kodolitsch et al, 29 2000 Enia et al, 45 1989 Armstrong et al, ,3 1998 Eagle et al, 33 1986 
(n = 122) (n = 11) (n = 41) b (n = 51)“ 


Acute coronary syndrome 18 (15) 

Chest wall syndrome 18 (15) 

Mediastinal cyst or tumor 

Neuroradicular syndrome 1 (0.8) 

Pulmonary disease 1 (0.8) 

Hypertensive crisis 11 (9) 

Gastrointestinal disease (esophagitis, PUD, 12 (9.8) 

gastritis, pancreatitis) 

Pneumothorax 2 (1.6) 

Pulmonary embolism 6 (4.9) 

Pleuritis 5 (4.0) 

Pericarditis 7 (5.7) 

Nondissecting aneurysm 


Aortic plaque rupture or intramural hemorrhage 
Valvular pathology 
Arteriosclerotic emboli 

No definitive diagnosis 4 (3.3) 


2(18) 8(20) 12(24) 


4(8) 


2(4) 


1 (9) _^ 

^_M2)_ 

4 (36) 3 (7) 3 (6) 

T(9) 13(32) 4R8T 

~ 9^22) ... 

~ 400) 5(10) 

1 ( 2 ) 

3(27) 14(34) 14(28) 


Abbreviation: PUD, peptic ulcer disease. 

“Ellipses indicate data not available. 

“Some patients without thoracic aortic dissection were given multiple diagnoses. 

“Included 55 patients with suspected thoracic aortic dissections but negative aortogram results; 4 patients were false-negative cases and later demonstrated to have thoracic 
aortic dissection. 


ical examination, and chest radiography but likely overesti¬ 
mate the accuracy of the clinical examination by selectively 
including more obvious cases. A small number of studies 
have included control populations and may therefore esti¬ 
mate the specificity of components of the clinical examina¬ 
tion; however, the accuracy of these data is again limited by 
the lack of independence between the selection of patients 
for study and clinical findings. 

Given the high, rapid mortality associated with undiag¬ 
nosed thoracic aortic dissection, prospective, independent 
studies of the clinical examination are needed to aid physi¬ 
cians in determining which aspects of the clinical examina¬ 
tion ought to be relied on to refer patients rationally for 
further diagnostic studies. Until then, the current literature 
permits the following limited conclusions about the clinical 
examination: 

• Most patients with thoracic aortic dissection have severe 
pain of abrupt onset. The absence of pain of sudden onset 
substantively decreases the probability of dissection (LR-, 
0.3; 95% Cl, 0.2-0.5); however, the study design of the 
reports included in this article precludes accurate assess¬ 
ment of the sensitivity and specificity of these features. The 
presence of tearing or ripping pain (LR+, 1.2-11) or pain 
that migrates (LR+, 1.1-7.6) may prove useful, but addi¬ 
tional data are required to know whether they are reliable 
features of the clinical history. 


• Physical findings associated with thoracic aortic dissection tend 
to be present in a third or fewer cases; however, pulse deficits 
(LR+, 5.7; 95% Cl, 1.4-23) or focal neurologic deficits (LR+, 
6.6-33) greatly increase the likelihood of thoracic aortic dissec¬ 
tion in the appropriate clinical setting. The presence or absence 
of a diastolic murmur is not useful (LR+, 1.4; LR-, 0.9). 

• A normal aorta and mediastinum on chest radiograph helps 
exclude the diagnosis (LR-, 0.3; 95% Cl, 0.2-0.4), but no par¬ 
ticular radiographic abnormality is dependably present. 

• The presence of the above findings in combination increases 
the LR+ for dissection, but even the absence of multiple 
findings does not definitively exclude the diagnosis. Clinical 
history, examination, and radiography can help rule in aor¬ 
tic dissection but are not sufficiently accurate to rule out the 
disease. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 The patient’s clinical history of sudden onset of 
severe chest pain is worrisome. His history of hypertension 
slightly increases his risk of a thoracic aortic dissection. The 
absence of a diastolic murmur, blood pressure differential, 
neurologic deficit, and widened mediastinum does not reli¬ 
ably exclude the diagnosis of thoracic aortic dissection. Given 
the high mortality of untreated or mistreated thoracic aortic 
dissection, this patient merits further advanced imaging. 
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CASE 2 The presence of a neurologic deficit in a patient 
with a clinical history consistent with thoracic aortic dis¬ 
section is a specific finding. This patient has a high likeli¬ 
hood of having an acute thoracic aortic dissection and 
ought to undergo urgent diagnostic imaging to locate and 
delineate the suspected lesion. 
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UPDATE: Thoracic Aortic Dissection 



Prepared by Michael Klompas, MD 
Reviewed by Frank Lederle, MD 


CLINICAL SCENARIO 


A 64-year-old man with history of hypertension is treated 
in the emergency department for a chief complaint of 
severe chest pain of recent onset, radiating to the abdo¬ 
men. His heart examination is remarkable for the presence 
of an S4 but no murmurs. His electrocardiogram (ECG) 
has changes consistent with acute inferior myocardial 
infarction. You have drawn up a syringe full of tissue plas¬ 
minogen activator that you are about to inject when a 
thought suddenly occurs to you: Could this patient have 
an acute thoracic aortic dissection? 

UPDATED SUMMARY ON THORACIC 
AORTIC DISSECTION 

Original Review 

Klompas M. Does this patient have an acute thoracic aortic 
dissection? JAMA. 2002;287(17):2262-2272. 

UPDATED LITERATURE SEARCH 

Additional aortic dissection studies were sought with the same 
parent search criteria used for The Rational Clinical Examina¬ 
tion series, combined with the terms, “dissecting aneurysm,” 
“aortic rupture,” “aortic aneurysm, thoracic,” “aneurysm, dis¬ 
secting,” “aortic diseases/diagnosis,” and the text word, “tho¬ 
racic aortic dissection.” The search was conducted for studies 
published between 2000 and August 2004. In addition, articles 
citing the original Rational Clinical Examination articles were 
reviewed. The search strategy resulted in 468 articles. Titles 
and abstracts were reviewed with the same limitation criteria 
as in the original article to find large, consecutive series of 
patients suspected to have aortic dissection, whose diagnosis 
was confirmed with a reference standard investigation (com¬ 
puted tomography [CT] angiography, magnetic resonance 
imaging [MRI], transesophageal echocardiography [TEE], 
aortogram, surgical exploration, or autopsy). As before, studies 
limited to proximal or distal aortic dissection or abdominal 
aortic dissection were excluded. One new study was identified. 


NEW FINDINGS 

• Younger patients with thoracic dissection (<40 years old) 
are more likely to have abrupt chest pain and Marfan syn¬ 
drome but less likely to have systolic hypertension com¬ 
pared with older patients. 1 

Details of the Update 

The only new investigation identified was an update 2 of the 
International Registry of Acute Aortic Dissection (IRAD) 
database report that figured prominently in the original 
review. 3 This article, primarily directed at reporting the fre¬ 
quency with which different diagnostic modalities were used 
to make the diagnosis of aortic dissection, included a table of 
the clinical features of 628 registry patients (vs 464 reported 
in the original IRAD article). As a registry of patients with 
thoracic aortic dissection, the data can be used to estimate 
the sensitivity. 

The registry also reports the results of imaging. Because 
many patients had multiple imaging studies (66%), we can 
use the results to estimate the sensitivity of the tests used as a 
reference standard. There was no statistical difference in the 
sensitivity for TEE, CT, MRI, or aortography (though rela¬ 
tively few patients had the latter 2 studies). Overall, these 
studies had a sensitivity of 0.91 (95% confidence interval 
[Cl], 0.87-0.94). 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

The additional patients in the IRAD database confirmed the 
frequency of the clinical features of acute aortic dissection 
already described in The Rational Clinical Examination 
article. There were no substantial changes in the sensitivity 
from the original cohort. The reported frequency of clinical 
features varied by no more than a few percentage points 
between the first and second IRAD articles. These data 
allow us to refine our sensitivity estimates with narrower 
CIs. Additionally, the IRAD report shows the difference in 
sensitivity for patients younger than 40 years vs aged 40 
years or older. 



CHAPTER 50 Update 


CHANGES IN THE REFERENCE STANDARD 

None. 

RESULTS OF LITERATURE REVIEW 

The abrupt onset of chest pain is the most sensitive finding 
for a thoracic aortic dissection ( >le 50-8). 


Table 50-8 Sensitivity of Findings for Thoracic Aortic Dissection 

Finding (No. of Studies) Summary Sensitivity (95% Cl) 

History 

Hypertension (14) 

0.65 (0.57-0.73) 

Hypertension, age < 40 y (1) 

0.34 (0.23-0.46) 

Marfan syndrome (10) 

0.04 (0.03-0.06) 

Marfan syndrome, age < 40 y (1) 

0.50 (0.38-0.62) 

Symptoms 

Abrupt onset (8) 

0.84(0.81-0.86) 

Abrupt onset, age < 40 y (1) 

0.96 (0.89-0.99) 

Chest pain (10) 

0.71 (0.58-0.83) 

Chest pain, age < 40 y (1) 

1.0 (0.96-1.0) 

Back pain (11) 

0.30(0.18-0.44) 

Signs 

Pulse deficit (17) 

0.32 (0.26-0.39) 

Murmur of aortic insufficiency (17) 

0.28 (0.22-0.35) 

Chest Radiograph 

Widened mediastinum (10) 

0.63 (0.44-0.80) 


Abbreviation: Cl, confidence interval. 


EVIDENCE FROM GUIDELINES 

The American College of Radiology has published appropri¬ 
ateness criteria to guide the choice of imaging modality for 
diagnosing acute thoracic aortic dissection. 4 The guidelines 
advocate that all patients suspected of having an aortic dis¬ 
section have a chest radiograph. Ironically, although the 
guidelines recommend a chest radiograph for all patients, 
much of the discussion of the radiograph observes significant 
limitations, including its lack of specificity, the subjectivity of 
interpretation, and imperfect sensitivity. Experts, however, 
recommend chest radiographs as a means of ruling out other 


pathology (especially when a baseline comparison radio¬ 
graph is available). 

The guidelines also discuss the appropriateness of refer¬ 
ence standard imaging modalities, including aortography, 
CT, MRI, and TEE. All 4 formats are highly sensitive and spe¬ 
cific. Computed tomography with contrast injection is 
believed to be most appropriate, however, because it is safer 
and less invasive than angiography or TEE, as well as being 
faster, cheaper, and more readily available than all 3 other 
modalities. Transesophageal echocardiography requires an 
experienced physician available at short notice for providing 
additional data for operative planning. 


CLINICAL SCENARIO—RESOLUTION 


This clinical scenario underscores some of the particular 
problems in the diagnosis and immediate management of 
severe chest pain in the emergency department. In this 
scenario, the clinician is faced with 2 realistic diagnostic 
possibilities that are life threatening and yet have contra¬ 
dictory treatments (thrombolysis can be deadly in a 
patient with aortic dissection). Unfortunately, clinical 
evaluation to distinguish between aortic dissection and 
acute myocardial infarction is limited in the setting of a 
patient with clear ECG changes yet with symptoms con¬ 
sistent with aortic pain. No single aspect of clinical his¬ 
tory, physical examination, ECG, or chest radiography is 
completely sensitive in the diagnosis of aortic dissection. 

Nonetheless, some evidence-based options do exist to 
aid the rapid treatment of this patient. A chest radiograph 
ought to be obtained and compared, if possible, against a 
previous study of the same patient. A completely normal 
radiograph result would substantially decrease the proba¬ 
bility of aortic dissection, whereas the detection of a wid¬ 
ened mediastinum, change in the aortic contour, or 
displacement of intimal calcification can be highly sugges¬ 
tive of the diagnosis. 

Ultimately, however, this patient needs a reference stan¬ 
dard study to exclude aortic dissection. The most favor¬ 
able options would be CT with contrast injection or TEE. 
The advantage of the former is its rapid diagnostic yield 
and ready availability. In this patient, where the possibility 
exists that he has a proximal aortic dissection causing an 
acute myocardial infarction, a TEE might be particularly 
advantageous. 
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THORACIC AORTIC DISSECTION—MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Although no current studies address the prior probability 
of an acute aortic dissection, a recent population-based epi¬ 
demiologic study allows us to infer a 2% thoracic aortic 
dissection prevalence among patients with chest pain. 5 

POPULATION FOR WHOM A THORACIC AORTIC 
DISSECTION MIGHT BE CONSIDERED 

• Patients with acute chest pain, especially those with 
hypertension or a Marfanoid habitus 

DETECTING THE LIKELIHOOD OF A 
THORACIC AORTIC DISSECTION 

Although clinical history, physical examination, and chest radi¬ 
ography can be suggestive of aortic dissection, none of these 
elements alone is sufficiendy sensitive or specific to indepen¬ 
dently rule in or rule out this high-stakes diagnosis. Nonethe¬ 
less, certain findings on the clinical evaluation can be helpful in 
suggesting the diagnosis and the need to perform a reference 
standard investigation such as CT angiography or TEE (Table 
50-9). Almost all patients have severe pain (pooled sensitivity, 
90%) of sudden onset (pooled sensitivity, 84%). The presence 
of a pulse or blood pressure differential from one side of the 
body to the other in a padent with severe chest pain is not often 
found in patients with dissection (sensitivity, 31%), but the 
finding increases the likelihood of aortic dissection when dis¬ 
covered (positive likelihood ratio [LR], 5.7). Similarly, a new 
focal neurologic deficit occurs infrequently (sensitivity, 17%) 
but also increases the likelihood of an aortic dissection when it 
is present (positive LR, 6.6-33.0). A widened mediastinum on 
chest radiograph is neither reliably present (pooled sensitivity, 
64%) nor diagnostic of aortic dissection (positive LR, 2.0). 
However, almost all chest radiographs from patients with dis¬ 
section will have some abnormality (pooled sensitivity, 90%), 
so a completely normal chest radiograph result decreases the 
probability of dissection being present (LR, 0.3). 

New data suggest that the presenting features in young 
patients with aortic dissection may differ from those of 
older patients, but the accuracy of those findings has not 
been studied. Despite the lack of data quantifying the accu¬ 
racy, young patients (<40 years old) with acute chest dis¬ 
comfort and Marfanoid features may have a greatly 
increased LR for aortic dissection compared with all other 
patients with chest discomfort. 


Table 50-9 Accuracy of Clinical Findings for Thoracic Aortic Dissection 
in Consecutive Patients Preselected for High Clinical Suspicion of 
Dissection Referred for Advanced Imaging 3 

Symptom or Sign 
(Total No. of Patients) 

LR+ (95% Cl) 

LR- (95% Cl) 

Focal neurologic deficit (325) 6 - 7 

6.6-33 

0.7- 0.9 

Pulse deficit (371) 6 8 

5.7(1.4-23) 

0.7 (0.6-0.9) 

Enlarged aorta or wide mediastinum 
(365) 6 ’ 7 ' 9 

2.0 (1.4-3.1) 

0.3 (0.2-0.4) 

History of hypertension (336) 6 - 89 

1.6 (1.2-2.0) 

0.5 (0.3-0.7) 

Sudden chest pain (365) 6 - 7 ' 9 

1.6 (1.0-2.4) 

0.3 (0.2-0.5) 

“Tearing” or "ripping” quality (325) 6 ’ 7 

1.2-11 

0.4-1.0 

Diastolic murmur (411 ) 6 9 

1.4 (1.0-2.0) 

0.9 (0.8-1.0) 

Migrating pain (290) 6 ' 9 

1.1-7.6 

0.6-1.0 

Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 

“95% Cl reported when 3 or more studies combined; otherwise, the LRs reported rep¬ 
resent the range. 


Table 50-10 Positive Likelihood Ratio of Aortic Dissection in Patients 
With Combinations of Findings 7 


No. of Findings 3 

LR (95% Cl) 

3 

66(4.1-1062) 

2 

5.3 (3.0-9.4) 

1 

0.5 (0.3-0.8) 

0 

0.1 (0-0.2) 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

“Findings include aortic pain (sudden-onset severe tearing pain), blood pressure or 
pulse differential between arms, and wide mediastinum on chest radiograph. 

Consideration of combinations of findings can substan¬ 
tially alter the posttest probability of the diagnosis (Table 50-10). 
The absence of aortic pain (new, severe, “tearing” pain), 
blood pressure differential, or widened mediastinum sub¬ 
stantially decreases the probability of aortic dissection (LR, 
0.1), whereas the presence of all 3 of these findings is highly 
suggestive of the diagnosis (LR, 66). 

REFERENCE STANDARDS 

Computed tomography, aortography, MRI, or TEE. 
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EVIDENCE TO SUPPORT THE UPDATE: 

Thoracic Aortic Dissection 



TITLE Characterizing the Young Patient With Aortic 
Dissection: Results From the International Registry of 
Aortic Dissection (IRAD). 

AUTHORS Januzzi JL, Isselbacher EM, Fattori R, et al; 
International Registry of Aortic Dissection (IRAD). 

CITATION JAm Coll Cardiol. 2004;43(4):665-669. 

QUESTION How do the presentation and prognosis of 
aortic dissection differ for younger vs older patients? 

DESIGN Retrospective case-control study using inter¬ 
national data registry. 

SETTING Five US hospitals and 8 non-US hospitals 
(Europe, Israel, lapan). 

PATIENTS Nine hundred fifty-one patients enrolled 
from January 1996 to November 2001. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Patients were retrospectively identified at each center. Physi¬ 
cian reviewers used a common form to collect data from 
patients’ medical records. The diagnostic standard was the 
unified conclusion from the combination of medical history, 
imaging studies, surgical visualization, or postmortem exam¬ 
ination. Data collected included demographic information, 
details of the clinical presentation, and the results of imaging 
studies. Results were stratified by age into 2 cohorts, younger 
than 40 years or aged 40 years or older. 

MAIN OUTCOME MEASURES 

Risk factors, clinical presentation, results of imaging studies, 
and mortality for aortic dissection patients younger than 40 
years compared with those aged 40 years or older. 


MAIN RESULTS 

Sixty-eight patients younger than 40 years were compared 
with 883 patients aged 40 years or older. Younger patients 
were less likely to have a history of hypertension (34% vs 
72%) but were more likely to have Marfan syndrome (50% vs 
2%), bicuspid aortic valve (9% vs 1%), or previous aortic 
valve replacement surgery (12% vs 5%). 

Young patients were even more likely to complain of pain 
of abrupt onset (96% vs 82%), but other symptoms of aortic 
dissection were similar between the 2 groups. Younger 
patients were less likely to be hypertensive (25% vs 45%). 
Mortality rates did not differ between the 2 groups (22% vs 
24%). 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Multicenter, multinational study with a large 
number of patients. 

LIMITATIONS Retrospectively collected data without any 
attempt to capture data not recorded at original patient pre¬ 
sentation. Selection and evaluation of patients were done 
without blinding to the ultimate clinical diagnosis or the 
results of previous studies. The cohort of younger patients 
was small relative to the sample size of older patients ana¬ 
lyzed. This is a descriptive rather than an interventional 
study. 

Younger patients with aortic dissection are substantially 
more likely to have Marfan syndrome or bicuspid aortic valve 
as their predisposing factors and less likely to have hyperten¬ 
sion. Otherwise, the clinical presentation and prognosis of 
younger patients are similar to those of older patients. 

Reviewed by Michael Klompas, MD 
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TITLE Choice of Computed Tomography, Transesopha¬ 
geal Echocardiography, Magnetic Resonance Imaging, 
and Aortography in Acute Aortic Dissection: International 
Registry of Acute Aortic Dissection (IRAD). 

AUTHORS Moore AG, Eagle KA, Bruckman D, et al. 

CITATION Am} Cardiol. 2002;89(10):1235-1238. 

QUESTIONS Which imaging modalities are currently 
being used to diagnose acute thoracic aortic dissection 
and what is their sensitivity? 

DESIGN International data registry. 

SETTING Five US hospitals and 8 non-US hospitals 
(Europe, Israel, Japan). 

PATIENTS Six hundred twenty-eight patients enrolled 
from January 1996 to December 1999. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Physician reviewers used a common form to collect data ret¬ 
rospectively from medical records. The diagnostic standard 
was the unified conclusion from the combination of medical 
history, imaging studies, surgical visualization, or postmor¬ 
tem examination. Data collected included demographic 
information, details of the clinical presentation, imaging 
modalities used and the order in which they were performed, 
and the sensitivity of each imaging modality. 

MAIN OUTCOME MEASURES 

First and second choice of imaging modalities chosen for 
each patient. Sensitivity of each imaging modality. 

MAIN RESULTS 

The study report includes 618 patients who had imaging. The 
most commonly used imaging modality was computed 
tomography (CT), used for 75% of patients; however, an 
almost identical number (72%) received transesophageal 
echocardiography (TEE). Two-thirds of patients (66%) had 2 
imaging studies done. CT was performed first in 63% (n = 379); 
TEE, in 32% (n = 193). Aortography (n = 24) and magnetic 
resonance imaging (MRI) (n = 9) were infrequently used as 


Table 50-11 Sensitivity of Imaging Modalities to Diagnose Thoracic 
Aortic Dissection 3 


Imaging Procedure 

Sensitivity 

Computed tomography 

0.93 (0.90-0.95) b 

Transesophageal echocardiography 

0.88 (0.83-0.92) 

Magnetic resonance imaging 

1.0 (0.7-1.0) 

Aortography 

0.88 (0.69-0.96) 


“The data represent the sensitivity for the initial study only. 

"Results are statistically identical for proximal vs distal dissection, though trans¬ 
esophageal echocardiography appears to have a higher sensitivity for proximal than 
distal dissections (0.90 vs 0.80; P- .06). 


the initial study. The sensitivity of each imaging modality is 
reported in ble 50-11. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Multicenter, multinational study describing 
actual clinical practice and the performance of CT and TEE 
under routine clinical conditions. 

LIMITATIONS Retrospective case series rather than a pro¬ 
spective evaluation of the sensitivity and specificity of each 
imaging modality. The series consequently reflects the biases 
of each center in choosing various radiographic techniques 
in accordance with local clinical culture and variable equip¬ 
ment and operator availability. Likewise, the interpretation 
of the radiographic studies was not necessarily done by 
blinded, expert reviewers and hence might misestimate the 
true sensitivity of the various tests. The sample size was small 
for patients imaged with MRI and aortography. The data on 
clinical presentation are particularly limited because the data 
were collected retrospectively from medical records, without 
any attempt to ascertain missing information. In addition, 
the data were not abstracted by blinded clinicians, because 
they had access to the patients’ final diagnoses. 

Computed tomography is the most commonly used 
modality to diagnose aortic dissection. Two-thirds of 
patients, however, receive more than 1 imaging test. CT, TEE, 
MRI, and aortography all have high sensitivity; however, 
with the possible exception of MRI, they can all yield false¬ 
negative results. 

Reviewed by Michael Klompas, MD 
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CASE 1 A 24-year-old healthy woman calls her primary 
care physician, complaining of a burning pain when uri¬ 
nating and increased urinary frequency for several hours. 
She has had 2 previous urinary tract infections (UTIs), 
and this episode seems “just like the other ones.” She is 
sexually active with 1 partner and uses a condom with 
spermicide. She denies fever, back pain, nausea, vomiting, 
vaginal discharge, and hematuria. 

CASE 2 A 20-year-old woman presents to your office, 
complaining of urinary frequency, burning on urination, 
and vaginal discharge. She has had occasional fevers and 
chills but denies nausea, vomiting, and back pain. She is 
sexually active with 1 partner, takes oral contraceptive 
pills, and intermittently they use condoms. Physical 
examination shows her to be in mild discomfort and 
febrile but without tenderness in her costovertebral areas. 
Pelvic examination demonstrates minimal white vaginal 
discharge, no vaginal lesions or rashes, and no cervicitis. 
Her dipstick urinalysis result is negative for leukocyte 
esterase, nitrite, and blood. 


WHY IS THIS AN IMPORTANT QUESTION TO 
ANSWER WITH A CLINICAL EXAMINATION? 


Acute uncomplicated UTIs are common in women, account¬ 
ing for more than 7 million office visits annually in the 
United States 1 and affecting half of all women at least once 
during their lifetimes. 2 A recent study of sexually active 
young women found the incidence of cystitis to be 0.5% to 
0.7% per year. 3 In aggregate, the direct costs of these infec¬ 
tions have been estimated to be $1.6 billion annually in the 
United States. 4 

One might anticipate that the management of acute 
uncomplicated UTI would be relatively uniform because the 
causative agents and in vitro susceptibilities are known, and 
therapeutic responses to antimicrobials have been studied 
carefully. 2 ' 5 ' 7 Unfortunately, the evaluation and treatment of 
acute uncomplicated UTI in women vary substantially 
among physicians, 8 likely reflecting the limitations of routine 
diagnostic assessments. When done correctly, however, the 
history taking and physical examination can be used in the 
initial evaluation of patients suspected of having an acute 
uncomplicated UTI and can guide the selection of additional 
diagnostic and therapeutic strategies. 2,7 

Definitions 

Several types of UTI are described by their location: urethri¬ 
tis, cystitis, pyelonephritis, and perinephric abscess. The 
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usual reference standard for diagnosing UTI is the presence 
of “significant” bacteria in a clean-catch or catheterized 
urine specimen, most commonly defined as the isolation of 
at least 10 5 colony-forming units (CFU) per milliliter of a 
single uropathogen. 2 In women who present with symptoms 
of cystitis or urethritis (lower UTI), it has been suggested 
that the best diagnostic criterion for clean-catch urine is the 
isolation of uropathogens in concentrations as low as at least 
10 2 CFU/mL. 9 

Uncomplicated UTIs occur in individuals who have a 
normal urinary tract system. A UTI in an individual with a 
functional or anatomic abnormality of the urinary tract 
(including a history of polycystic renal disease, nephrolithi¬ 
asis, neurogenic bladder, diabetes mellitus, immunosup¬ 
pression, pregnancy, indwelling urinary catheter, or recent 
urinary tract instrumentation) is considered complicated 
and may have a higher risk of treatment failure. 10 Differen¬ 
tiating between these types of UTIs is important because 
uncomplicated infections are usually cured with simple 
antimicrobial regimens. 10 

The prevalence of asymptomatic bacteriuria (significant 
bacteriuria without symptoms of UTI) in women of 
reproductive age is approximately 5%. 11,12 This value rep¬ 
resents the pretest probability of disease (the probability 
of UTI before any diagnostic tests are applied). Several 
historical features, symptoms, and signs associated with 
acute UTI may be useful for screening, allowing the clini¬ 
cian to estimate the probability of UTI in a patient after 
taking a medical history and performing a physical exami¬ 
nation. Historical features such as a history of UTI, recent 
sexual activity, or contraceptive use identify individuals at 
greater risk of developing a UTI. Symptoms of an acute 
infection include burning or pain on urination (dysuria), 
frequent voiding of small volumes of urine (frequency), 
the urge to void immediately (urgency), and the presence 
of blood in the urine (hematuria). Discomfort in the lower 
abdominal area is also consistent with a UTI. In contrast, 
patients who report vaginal discharge or irritation are less 
likely to have a UTI and more likely to have vaginitis or 
cervicitis. The presence of fever and suprapubic or costo¬ 
vertebral angle tenderness may indicate infection of the 
upper urinary tract. 

Differential Diagnoses 

Vaginal infections (eg, Gardnerella, Candida albicans, Tricho¬ 
monas ), sexually transmitted diseases that may lead to pel¬ 
vic inflammatory disease (eg, Chlamydia trachomatis, 
Neisseria gonorrhoeae), and other sexually transmitted dis¬ 
eases (eg, herpes simplex virus) that mimic symptoms of 
UTI are considered separate from UTIs. These infections 
are caused by different microbes; limited to female genital 
structures, with a unique set of complications if untreated; 
and require different forms of treatment. 13 Differentiating 
between vaginal infections, sexually transmitted diseases, 
and UTIs can be difficult because symptoms and signs 
commonly overlap. 13 


METHODS 

We searched the English-language medical literature to 
determine the accuracy and precision of the clinical exami¬ 
nation in women suspected of having an acute UTI. We 
searched MEDLINE for articles from 1966 through Septem¬ 
ber 2001, with a search strategy similar to that used by other 
authors in this series. 14 Search terms included “urinary tract 
infection,” “diagnostic tests,” “physical examination,” and 
“sensitivity and specificity.” This computerized search was 
supplemented with a manual review of the bibliographies of 
all identified articles, additional “core” articles (identified a 
priori as articles used to develop a recent guideline for treat¬ 
ing acute uncomplicated UTI in women), 3 commonly used 
clinical skills textbooks, 1517 and contact with experts in the 
field. One of the authors (B.K.N.) initially screened the titles 
and abstracts of the search results. Two of the authors (S.B. 
and B.K.N.) then independently reviewed and abstracted 
data from articles identified as relevant. 

We included studies in our review if they contained original 
data on the accuracy or precision of the symptoms or signs in 
diagnosing acute uncomplicated UTI in healthy women. Arti¬ 
cles were excluded if they evaluated infants, children or adoles¬ 
cents, pregnant women, nursing home patients, or patients 
with complicated UTI or contained insufficient or incomplete 
data to allow calculation of likelihood ratios (LRs) for signs or 
symptoms of acute UTI. 

We also chose to include articles on the dipstick test in this 
analysis because it is commonly used in the clinical setting 
and provides an immediate result that can be incorporated 
with other elements of the initial clinical assessment. During 
our search, we discovered that a previous systematic review 
evaluated the diagnostic accuracy of the dipstick test. 18 
Because this was a high-quality review (meeting all 6 criteria 
of a previously published guideline for evaluating systematic 
reviews), 19 we chose to use the information about the accu¬ 
racy of the dipstick test synthesized in that article. 

Quality Assessment of Included Articles 

The methodological quality of the included articles was 
assessed independendy by 2 authors (S.B. and B.K.N.), using 
criteria adapted from other authors in this series. 14,20 Disagree¬ 
ments were resolved by a third author (S.S.). Level 1 studies 
included those with an independent blind comparison of signs 
or symptoms with a gold standard among a large number 
(>50) of consecutive patients suspected of having a UTI. Level 
2 studies were similar to those in level 1 but involved a smaller 
number of patients (<50). The remaining levels are described 
in Table 1-7. 

Data Analysis 

We used published raw data from the studies that met our 
criteria to calculate summary measures for the LRs for 
components of the clinical examination for UTI. LRs are 
related to sensitivity and specificity (positive likelihood 
ratio [LR+] = sensitivity/[ 1 - specificity]; negative likeli- 
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hood ratio [LR-] = [1 - sensitivity]/specificity) but are 
more clinically useful because they can be used to generate 
posttest probabilities. 21 A random-effects model was used to 
generate conservative summary measures and confidence 
intervals (CIs) for the LRs and estimates of disease preva¬ 
lence. 22,23 Uncertainty in these measures is reflected in the 
broad CIs around the estimates. When a summary LR 
included studies of lower quality, we conducted sensitivity 
analyses to examine the influence of excluding lower-quality 
studies on the summary LR and the effectiveness score, a 
measure of the discriminatory power of a diagnostic test. 24 

RESULTS 

Study Characteristics 

We found 9 studies of the 464 identified by the search that 
satisfied all inclusion criteria (Table 51-1). Six studies 25 ' 30 
reported the accuracy of 1 or more symptoms in the diagno¬ 
sis of UTI, 2 studies 31,32 reported the accuracy of symptoms 
and physical examination signs, and 1 study reported the 
accuracy of self-diagnosis. 33 


The studies were published between 1965 and 2001 and 
generally involved patients with 1 or more symptoms of a 
UTI who presented to outpatient clinics. The summary prev¬ 
alence of UTI in the 5 studies that included only sympto¬ 
matic patients and used an appropriate gold standard was 
48% (95% Cl, 41%-55%), 25 ' 28,30 indicating a high probability 
of disease for women who met the studies’ inclusion criteria. 
In all of the included studies, UTI was defined by the pres¬ 
ence of at least 10000 or 100000 CFU/mL of a single uro- 
pathogen, except for the most recent study, which used a 
cutoff of at least 100 CFU/mL. 33 

Five 25 ' 28,30 of the 8 studies describing the accuracy of 
symptoms were of high quality (level 1). Both studies 31,32 
describing the accuracy of the physical examination were 
of lower quality (levels 3 and 4), as was the study examin¬ 
ing self-diagnosis (level 5). 33 Reasons for quality scores 
lower than level 1 are shown in Table 51-1. Two of the 
lower-quality studies 29,31 included patients with vaginal 
discharge but without symptoms of UTI and therefore did 
not specifically address the diagnostic accuracy of signs 
and symptoms exclusively in women suspected of having a 
UTI. 


Table 51 -1 Studies Used to Determine the Accuracy of Clinical History and Physical Examination in Women 
Suspected of Having Urinary Tract Infection 



Source, y 

Methodologic Quality 3 

Inclusion Criteria 

No. of Patients 

Mean 
Age, y b 

Incidence of 
UTI, % 

Setting and Country 

Symptoms 

Gallagher et al, 25 1965 

Level 1 

Women with symptoms of UTI 

130 


59 

Urban clinics in New 
Zealand 

Mond et al, 26 1965 

Level 1 

Women with symptoms of UTI 

83 


45 

General practice in the 
United Kingdom 

Lawson et al, 27 1973 

Level 1 

Women aged 15-55 y with 
symptoms of UTI 

343 


47 

Two general practices in 
the United Kingdom 

Dans and Klaus, 28 1976 

Level 1 

Women reporting dysuria 

84 

26 

46 

US adult walk-in clinic 

Komaroff et al, 29 1978 

Level 4 (including 
women without symp¬ 
toms suggestive of UTI) 

Women with symptoms sugges¬ 
tive of urinary or vaginal infection 

821 

24 

12 

US ambulatory care 
facility 

Nazareth and King, 30 
1993 

Level 1 

Women aged 16-45 y present¬ 
ing with frequency or dysuria 

54 

29 

28 

Two general practices in 
suburban London 

Self-diagnosis 

Gupta et al, 33 2001 

Level 5 (no urine cul¬ 
ture in women without 
symptoms) 

Women >18 y with a history of 
recurrent UTI 

172 

23 

NA 

US university-based 
clinic 

Symptoms and Physical Examination Findings 

Wong et al, 31 1984 

Level 4 (including 
patients without symp¬ 
toms suggestive of UTI) 

Women with symptoms of UTI or 
with both UTI and vaginal com¬ 
plaints and random selection of 
women with vaginitis or STD 

53 Cases, 139 
controls 


NA 

US STD clinic 

Wigton et al, 32 1985 

Level 3 (retrospective 
chart review) 

Retrospective review of patients 
who had urine culture in emer¬ 
gency department 

216 In training 
set, 236 in vali¬ 
dation set 


NA 

US emergency depart¬ 
ment 


Abbreviations: NA, indicates not applicable; STD, sexually transmitted disease; UTI, urinary tract infection. 

“Methodologic quality criteria are described in the “Methods” section (see also Table 1 -7). Reasons for methodologic quality scores lower than level 1 are shown in parentheses. 
“Ellipses indicate not mentioned. 
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Precision 

The precision of a symptom or sign refers to the degree to 
which different examiners report the same finding (eg, dys- 
uria present or absent) when interviewing or examining the 
same patient. None of the identified studies described the 
precision of the medical history or physical examination in 
the diagnosis of UTI, possibly because the questions and 
examination procedures were considered to be unambigu¬ 
ous. For example, most of the historical items consist of ask¬ 
ing yes or no questions such as, Are you having burning or 
pain with urination? Variations in interview style and the 
phrasing of questions may affect results, but there is no infor¬ 
mation from the identified studies to suggest particular 
wording of questions or specific ways to examine patients for 
the 2 relevant physical examination signs (costovertebral 
angle tenderness and vaginal discharge). 

Accuracy 

Symptoms 

Eight studies 25 ' 32 examined the accuracy of 9 symptoms for 
predicting the presence of UTI. These symptoms and the 
corresponding LR+ and LR- from each study are shown in 
Table 51-2. Three of the symptoms (flank pain, abdominal 
pain, fever) had both summary LR+ and summary LR- with 
CIs overlapping 1.0 and are therefore not useful as diagnostic 
tests. 

Four symptoms significantly increased the probability of 
UTI: dysuria, frequency, hematuria, and back pain. Four 
symptoms significantly decreased the probability of UTI: 
absence of dysuria, absence of back pain, a history of vaginal 
discharge, and a history of vaginal irritation. The symptoms 
with the greatest diagnostic power were a history of vaginal 
discharge (LR, 0.34) and a history of vaginal irritation (LR, 
0.24); both of these symptoms substantially reduced the 
probability of UTI. 

Self-diagnosis 

One study examined the accuracy of self-diagnosis and 
included 172 women in a university-based practice with recur¬ 
rent UTI (more than 2 UTIs in the past year). 33 During the 
study period, 88 of the women reported 172 episodes of self- 
diagnosed UTI; 144 of these episodes (84%; 95% Cl, 77%- 
90%) were found to have positive urine culture results. Addi¬ 
tionally, 64 women reported mild symptoms that they did not 
self-diagnose as UTI and another 20 women never had symp¬ 
toms. In this population of patients, the positive predictive 
value of self-diagnosis was high (84%). LRs for self-diagnosis 
can be calculated assuming that the women with mild symp¬ 
toms or no symptoms correctly self-diagnosed with no infec¬ 
tion (these women did not have a urine culture, but all 
symptoms resolved spontaneously). If this assumption is true, 
the LR for a positive self-diagnosis is 4.0, whereas the LR for a 
negative self-diagnosis is 0 (Table 51-2). 

Combinations of Symptoms 

One study 29 provided information to calculate the LRs for 
combinations of symptoms in the diagnosis of UTI (Table 
51-3). In this study, the presence of dysuria and frequency 


without vaginal discharge or irritation was associated with a 
high LR (25). Conversely, the LR for the combination of vagi¬ 
nal discharge or irritation without dysuria was low (0.3). 
Although the LRs from this study must be interpreted with cau¬ 
tion because of the study’s low quality score (level 4), the 
observed LRs were similar to those calculated by combining the 
individual summary LRs from the other studies (Table 51-3). 

Physical Examination 

Two studies 31,32 reported the accuracy of 2 physical examina¬ 
tion signs for the presence of UTI. Both studies were of rela¬ 
tively low quality, and therefore the summary data do not 
represent strong evidence of the true accuracy of these signs 
(Table 51-2). The presence of costovertebral angle tenderness 
increases the likelihood of infection, but the LR is only 
weakly predictive and similar in magnitude to the related 
symptom of back pain. The presence of vaginal discharge on 
examination decreases the likelihood of UTI (LR, 0.69), 
although it is less powerful than the LR for the symptom of 
vaginal discharge reported by the patient (0.34). 

Dipstick Urinalysis 

Because a high-quality systematic review examining the 
accuracy of the dipstick urinalysis for the prediction of UTI 
exists, we used the data synthesized in the report by Hurlbut 
and Littenberg. 18 Those authors identified and summarized 
51 studies and generated summary receiver operating char¬ 
acteristic (ROC) curves for combinations of the nitrite and 
leukocyte esterase dipstick tests. They found that the nitrite¬ 
positive or leukocyte-esterase-positive combination had the 
greatest area under the ROC curve. The point on the sum¬ 
mary ROC curve with the best accuracy represents a sensitiv¬ 
ity of 75% and a specificity of 82%. With these values, the 
LR+ for a urinalysis is 4.2 and the LR- is 0.3 (Table 51-2). A 
range of similar points on the ROC curve that was supported 
by the largest number of studies was also examined, and the 
resulting LRs were similar in magnitude. Although other 
combinations of the nitrite and leukocyte esterase test will 
increase either sensitivity or specificity (eg, requiring both to 
be positive will decrease sensitivity and increase specificity), 
the nitrite- or leukocyte-esterase-positive combination was 
the most accurate test. 18 

Sensitivity Analysis 

Because the largest study to examine the accuracy of symp¬ 
toms was also of lower quality, 29 we performed a sensitivity 
analysis to determine the effect of this study on the summary 
LRs. Inclusion of this study always made the symptoms (dys¬ 
uria, frequency, vaginal irritation, and vaginal discharge) 
appear to be more powerful diagnostic tests. However, in no 
case did inclusion of this study improve a test with marginal 
discriminatory power into the highly effective range (effective¬ 
ness score > 3.0). 24 The LR+ and LR- for dysuria and fre¬ 
quency excluded 1.0, whether or not the study was included, 
with one exception. The LR+ for increased urinary frequency 
was 1.8 (95% Cl, 1.1-3.0) when all studies were included vs 1.4 
(95% Cl, 1.0-1.9) when the study was excluded. That study 29 
has a larger effect on the diagnostic value of vaginal symptoms 
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Table 51-2 Clinical Signs and Symptoms in the Prediction of Urinary Tract Infection 3 



Study 

LR+ (95% Cl) 

LR- (95% Cl) 


Dysuria 

Gallagher et al 25 

1.3 (1.1-1.6) 

0.28(0.12-0.67) 


Mond et al 26 

1.4 (1.1-1.8) 

0.22 (0.07-0.70) 


Lawson et al 27 

1.2 (1.0-1.5) 

0.77 (0.60-0.99) 


Nazareth and King 30 

1.1 (0.87-1.5) 

0.58(0.14-2.4) 


Komaroff et al 29 

3.2 (2.7-3.7) 

0.16(0.09-0.27) 


Wong et al 31 

3.0 (2.0-4.6) 

0.53 (0.39-0.73) 


Wigton et al 32 (training set) 

1.4 (1.1-1.8) 

0.69 (0.52-0.92) 


Wigton et al 32 (validation set) 

1.1 (0.81-1.4) 

0.94(0.72-1.2) 


Summary 

1.5 (1.2-2.0) 

0.48 (0.31-0.74) 



Frequency 



Gallagher et al 25 

0.96(0.87-1.1) 

1.6 (0.44-6.0) 


Mond et al 26 

0.99(0.90-1.1) 

1.2(0.17-8.0) 


Lawson et al 27 

1.1 (1.0-1.3) 

0.65 (0.43-0.97) 


Dans and Klaus 28 

1.4 (1.0-2.1) 

0.63(0.37-1.1) 


Nazareth and King 30 

1.0(0.80-1.3) 

0.87 (0.20-3.8) 


Komaroff et al 29 

10(7.8-13) 

0.07 (0.04-0.16) 


Wong et al 31 

5.2 (3.1-8.7) 

0.45 (0.32-0.63) 


Wigton et al 32 (training set) 

1.8 (1.0-3.5) 

0.87(0.75-1.0) 


Wigton et al 32 (validation set) 

1.3 (0.80-2.0) 

0.93(0.80-1.1) 


Summary 

1.8 (1.1-3.0) 

0.59 (0.35-1.0) 



Hematuria 


Gallagher et al 25 

1.8 (0.80-3.9) 

0.88(0.75-1.0) 


Mond et al 26 

2.9(1.0-8.6) 

0.81 (0.66-1.0) 


Nazareth and King 30 

6.5 (1.4-30) 

0.70(0.49-1.0) 


Wigton et al 32 (training set) 

1.6 (0.82-3.3) 

0.92(0.82-1.0) 


Wigton et al 32 (validation set) 

1.4 (0.60-3.4) 

0.96(0.88-1.1) 


Summary 

2.0 (1.3-2.9) 

0.92 (0.86-0.98) 


Fever 

Gallagher et al 25 

2.4(1.2-4.9) 

0.75 (0.61-0.92) 


Mond et al 26 

2.8 (0.77-9.9) 

0.87(0.73-1.0) 


Lawson et al 27 

0.65(0.32-1.3) 

1.0(0.97-1.1) 


Nazareth and King 30 

0(0-175) 

0.92 (0.78-1.1) 


Wigton et al 32 (training set) 

1.5(0.74-3.0) 

0.94(0.84-1.0) 


Wigton et al 32 (validation set) 

2.1 (1.0-4.6) 

0.89 (0.80-0.99) 


Summary 

1.6 (1.0-2.6) 

0.9 (0.9-1.0) 


Flank Pain 

Gallagher et al 25 

1.1 (0.64-1.7) 

0.98(0.77-1.2) 


Mond et al 26 

1.1 (0.54-2.2) 

0.97(0.74-1.3) 


Lawson et al 27 

1.1 (0.87-1.4) 

0.92(0.77-1.1) 


Summary 

1.1 (0.90-1.4) 

0.84 (0.82-1.1) 


Lower Abdominal Pain 

Gallagher et al 25 

0.99 (0.76-1.3) 

1.0(0.63-1.6) 


Mond et al 26 

1.2(0.67-2.1) 

0.91 (0.65-1.3) 


Wong et al 31 

1.5 (0.90-2.4) 

0.87(0.71-1.1) 


Summary 

1.1 (0.90-1.4) 

0.89 (0.75-1.0) 


Vaginal Discharge 

Dans and Klaus 28 

0.80(0.53-1.2) 

1.3(0.82-2.0) 


Komaroff et al 29 

0.11 (0.06-0.19) 

12(8.9-16) 



(continued) 
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Table 51-2 Clinical Signs and Symptoms in the Prediction of Urinary Tract Infection 3 ( Continued ) 



Study 

LR+ (95% Cl) 

LR- (95% Cl) 


Vaginal Discharge 

Wong et al 31 

0.43 (0.27-0.69) 

1.9(1.4-2.5) 


Summary 

0.34 (0.14-0.86) 

3.1 (1.0-9.3) 


Vaginal Irritation 

Komaroff et al 29 

0.09(0.05-0.18) 

6.2 (5.0-7.6) 


Wong et al 31 

0.63(0.37-1.1) 

1.2 (1.0-1.5) 


Summary 

0.24 (0.06-0.93) 

2.7 (0.88-8.5) 


Back Pain 

Wigton et al 32 (training set) 

1.7 (1.1-2.6) 

0.80 (0.67-0.96) 


Wigton et al 32 (validation set) 

1.6 (1.1-2.5) 

0.81 (0.68-0.97) 


Nazareth and King 30 

0.78 (0.25-2.4) 

1.1 (0.79-1.5) 


Summary 

1.6 (1.2-2.1) 

0.83 (0.74-0.94) 


Self-diagnosis 

Gupta et al 33 

4.0 (2.9-5.5) 

0 (0-0.08) 


Vaginal Discharge on Physical Examination 

Wong et al 31 

0.81 (0.66-0.99) 

1.9 (1.1-3.3) 


Wigton et al 32 (training set) 

0.32 (0.12-0.89) 

1.1 (1.0-1.2) 


Wigton et al 32 (validation set) 

0.44 (0.19-1.0) 

1.1 (1.0-1.2) 


Summary 

0.69 (0.50-0.94) 

1.1 (1.0-1.2) 


Costovertebral Angle Tenderness on Physical Examination 

Wigton et al 32 (training set) 

2.0 (1.2-3.4) 

0.82(0.71-0.95) 


Wigton et al 32 (validation set) 

1.4 (0.8-2.4) 

0.91 (0.79-1.0) 


Summary 

1.7 (1.1-2.5) 

0.86 (0.78-0.96) 


Dipstick Urinalysis 0 

Hurlbut and Littenberg 18 

4.2 

0.3 



Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio. 

“The study by Wigton et al 32 included 2 separate sets of patients evaluated by retrospective chart review: a training set and a validation set. Likelihood ratios in bold are significant. 
b A positive result was defined as leukocyte esterase positive or nitrite positive; a negative result was defined as both negative. The values were taken from a receiver operating 
characteristic curve, so no Cl could be calculated. 


because fewer studies were involved. The absence of vaginal 
discharge, a feature reported in only 3 studies, makes a UTI 
more likely whether or not this study 29 is included (LR, 3.1 
[95% Cl, 1.0-9.3] for all studies vs LR, 1.7 [95% Cl, 1.3-2.2] 
when excluded). The presence of vaginal discharge still 
decreases the likelihood of a UTI whether or not the study by 
Komaroff et al 29 is included (LR, 0.34 [95% Cl, 0.14-0.86] for 
all studies vs LR, 0.60 [95% Cl, 0.39-0.91] when the study is 
excluded). 

COMMENT 

Symptoms suggestive of UTI are common complaints of 
young women seeking urgent medical care. Although text¬ 
books of clinical medicine 15 ' 17 routinely mention many of the 
symptoms and signs of UTI, the overall accuracy of these 
symptoms and signs has not previously been critically and 
systematically evaluated. A clear understanding of the value 
of each of these diagnostic tests may enable physicians to 
make more informed decisions about the choice of specific 
tests and management options. 


Rule Out Complicated Urinary Tract Infection 

The initial step is to be certain that the patient does not have 
a complicated UTI as defined by the factors listed earlier (see 
“Definitions” section). The probability of UTI in patients 
with risk factors for a complicated infection is not known 
because these patients were not included in the studies iden¬ 
tified by our search. Such patients may be at greater risk of 
treatment failure, 10 and clinicians may want to consider early 
urine culture and empirical treatment as shown at the top of 
the proposed algorithm (Figure 51-1). 

Pretest Probability and the Diagnostic Value of 
Presenting to a Clinician 

With a standard evidence-based technique, 21 a clinical 
encounter begins with an estimation of the pretest probabil¬ 
ity of disease, followed by the application of 1 or more diag¬ 
nostic tests to determine the posttest probability of disease. 
We consider the pretest probability of UTI to be equal to the 
prevalence observed in studies of asymptomatic bacteriuria, 
or approximately 5%. 11,12 In this review, 5 studies reported 
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the prevalence of UTI in patients presenting with 1 or more 
symptoms of acute UTI, and the summary prevalence was 
48% (95% Cl, 41%-55%). 

The probability of UTI changes substantially when a 
patient presents to a clinician, increasing from 5% (in histor¬ 
ical controls without symptoms) to approximately 50% (in 
patients in the included studies who presented with 1 or 
more symptoms). This change in probability corresponds to 
an LR of 19, representing a powerful “diagnostic test.” Clini¬ 
cally, it is useful to know that patients who present with 1 or 
more symptoms of UTI have a high probability of infection. 
Because all of the studies included in this review evaluated 
the diagnostic value of symptoms and signs after patients 
presented to a clinician, the relevant pretest probability for 
these tests is 50%. 

Although the pretest probability of UTI in the average 
patient who presents with 1 or more symptoms is approxi¬ 
mately 50%, this varies considerably according to the individ¬ 
ual’s risk profile. There are 3 well-established risk factors for 
acute UTI in young women: recent sexual intercourse, 3 - 34 ' 38 use 
of spermicide (on condoms or with diaphragms) during 
sexual intercourse, 3,34 ‘ 36,39,40 and history of UTI. 3 - 36 Other risk 
factors, including a maternal history of UTI, 34 a history of 
childhood onset of UTI, 34 and the presence of bacterial vag¬ 
inosis, 41 also have been found to be associated with UTI. 
The presence of any of these risk factors increases the pre¬ 
test probability of UTI and should be considered when 
evaluating patients. Unfortunately, the diagnostic power of 
these risk factors (sensitivity, specificity, or LRs) is not 
known, because the majority of studies assessing these risk 


Table 51-3 Likelihood Ratios for Combinations of Symptoms 


Overall 

LR Using 
Combinations of 
Individual 
Symptoms 8 

Based on Data From 
Komaroff et al 29 

Symptom Combinations 

Posttest 

Probability Summary 
of UTI, % b LR 8 

Dysuria present 

1.5 

77 

Frequency present 

1.8 


Vaginal discharge absent 

3.1 


Vaginal irritation absent 

2.7 


Overall 11 

23 

25 

Dysuria absent 

0.5 

4 

Vaginal discharge or irrita¬ 
tion present 

0.3 or 0.2 


Overall 

0.1-0.2 

0.3 

Dysuria or frequency present 

1.5 or 1.8 

9 

Vaginal discharge or irrita¬ 
tion present 

0.3 or 0.2 


Overall 

0.3-0.5 

0.7 


Abbreviations: LR, likelihood ratio; UTI, urinary tract infection. 
a The overall LR was calculated by multiplying the summary LRs from Table 2 for 
each of the findings in each set of symptom combinations. LRs < 1 are rounded off 
to make computation easier when combining findings. 

"The pretest probability of UTI in the study by Komaroff et al 29 was 12% (the preva¬ 
lence of UTI in the study). 

likelihood ratios were calculated from the observed change in the pretest and 
posttest probability of UTI; confidence intervals cannot be calculated because the 
raw data were not available. 

"Values are rounded to nearest integer. 


Figure 51-1 Proposed Algorithm for 
Evaluating Women With Symptoms 
of Acute Urinary Tract Infection 

Abbreviation: UTI, urinary tract infection. 
a ln women who have risk factors for sexually transmit¬ 
ted diseases, consider testing for chlamydia. The US 
Preventive Services Task Force recommends screen¬ 
ing for chlamydia for all women aged 25 years or 
younger and women of any age with more than 1 sex¬ 
ual partner, a history of sexually transmitted disease, 
or inconsistent use of condoms. 52 
b For a definition of complicated UTI, see the “Defini¬ 
tions” section of the text. 

The only physical examination finding that increases 
the likelihood of UTI is costovertebral angle tender¬ 
ness, and clinicians may consider not performing this 
test in patients with typical symptoms of acute 
uncomplicated UTI (as in telephone management). 
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factors used a case-control design or did not present suffi¬ 
cient data to calculate LRs. 3 ' 4,35 ' 39,42 Further research is 
needed to determine the diagnostic power of these risk fac¬ 
tors so that the information can be used during the clinical 
encounter to estimate the pretest probability of disease. 

Refining Probability With the Medical History 
and Physical Examination 

In the included studies, all diagnostic tests were evaluated by 
their ability to change the already high (50%) probability of 
UTI in the study population. Because these patients initially 
presented with at least 1 symptom, some of the power of each 
symptom was already “used up” by the time the patient pre¬ 
sented to a clinician (and the probability of UTI increased 
from 5% to 50%). In a sense, the diagnostic power of the 
symptom is being “used” twice. Initially, the presenting 
symptom (most commonly dysuria or frequency) caused the 
patient to present to a clinician and was at least partially 
responsible for raising the probability of UTI from 5% to 
50%. Subsequently, the value of the presenting symptom and 
all other potentially relevant symptoms was assessed after 
presentation to a clinician. 

It is therefore not surprising that most of the individual 
symptoms and signs have LRs relatively close to 1.0 and 
therefore do not have great additional diagnostic power after 
presentation. The main exception to this finding is the his¬ 
tory of vaginal discharge or vaginal irritation, which reduces 
the probability of UTI. 

One study found that back pain and costovertebral angle 
tenderness were useful for predicting the presence of UTI. 32 
This study was a retrospective chart review of patients who 
had a urine culture in an emergency department, and it is 
possible that back pain and costovertebral angle tenderness 
were predictive of upper UTI (pyelonephritis). However, 
because none of the included studies performed a gold stan¬ 
dard test for upper UTI, we were unable to determine 
whether individual symptoms and signs were more predic¬ 
tive of upper vs lower UTI. Most patients with symptoms 
suggestive of UTI and features classically associated with 
upper UTI (back pain, fever) are evaluated and treated for 
presumed pyelonephritis (Figure 51-1), even though the 
diagnostic accuracy of these signs and symptoms for predict¬ 
ing upper UTI is not known. Because most patients in the 
included studies did not have back pain and fever, we believe 
that the other symptoms evaluated in our review are most 
useful for predicting lower UTI (cystitis). 

In contrast to the value of individual tests, certain combi¬ 
nations of symptoms result in large changes in the probabil¬ 
ity of UTI and represent powerful diagnostic tests. The 
combination of dysuria and frequency without vaginal dis¬ 
charge or irritation corresponds to an LR of 25. Although the 
combined LRs were generated from only 1 study of lower 
quality, 29 these LRs were similar to those found when multi¬ 
plying the summary LRs for the individual symptoms, sug¬ 
gesting that they are reasonable estimates of the true 
diagnostic power of these combinations. In addition, another 
study 43 that was excluded from our analysis (because it 


included an unknown number of asymptomatic patients) 
used the same combinations of symptoms and found similar 
positive predictive values and LRs. 

Although evaluated in only 1 study, 33 self-diagnosis appears 
to be a useful diagnostic test (LR, 4.0) in women with recur¬ 
rent UTI. Because this study did not perform urine cultures 
for women with mild or no symptoms, there is some uncer¬ 
tainty in the LR estimates. Similarly, the study population 
consisted of mostly highly educated single white women, and 
it is not clear whether the results apply to other groups of 
women. Nonetheless, these findings suggest that women 
learn to recognize the symptoms of UTI and are able to accu¬ 
rately diagnose a new infection, a finding that deserves fur¬ 
ther study and may have important implications for 
treatment of this large group of patients. 

Refining Probability Using Dipstick Urinalysis 

Dipstick urinalysis alone is a moderately powerful diagnostic 
test (Table 51-2). If the dipstick is used alone, the posttest 
probabilities for women with symptoms of a UTI are 81% 
(positive result) and 23% (negative result). 

A Diagnostic Algorithm for Evaluating 
Patients With Symptoms of Urinary Tract Infection 

Figure 51-1 shows a proposed algorithm for evaluating 
patients with symptoms of UTI. Although the algorithm 
itself has not been prospectively studied, the recommenda¬ 
tions are based on the posttest probabilities of UTI generated 
from the summary LRs in the current analysis (Table 51-2). 
In women with risk factors for a complicated UTI or with 
back pain, fever, or malaise (suggesting possible pyelonephri¬ 
tis), a urine culture with initial empirical treatment is recom¬ 
mended. If a woman reports a history of vaginal discharge, 
the posttest probability of UTI from this single historical 
item is reduced to 23%, and a pelvic examination to rule out 
a vaginal infection should be considered in addition to a dip¬ 
stick urinalysis and urine culture. 

The algorithm highlights the finding that the medical his¬ 
tory and physical examination alone can substantially 
increase the posttest probability of UTI, effectively “ruling 
in” the diagnosis. Because the only physical examination 
finding that increases the probability of UTI is costovertebral 
angle tenderness, the physical examination may be omitted 
without a substantial loss of diagnostic power in patients 
without a history of vaginal discharge or irritation. With 
individual summary LRs, a patient with dysuria, frequency, 
and hematuria (but no back pain at this point in the algo¬ 
rithm) has a posttest probability of UTI of 81%; with the 
combined LR estimate of dysuria and frequency without vag¬ 
inal discharge (LR, 25), the posttest probability of UTI is 
96%. Given these high probabilities of UTI, clinicians should 
consider empirical treatment without urine culture or dip¬ 
stick urinalysis. 

Conversely, even mostly negative history responses, physical 
examination findings, and dipstick urinalysis results cannot 
reliably rule out the diagnosis of UTI in women without a his¬ 
tory of vaginal discharge or irritation. For example, to generate 
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the lowest possible posttest probability of disease, a woman 
must still present with at least 1 symptom. If she presents with 
frequency (LR, 1.8) with no dysuria (LR, 0.5) and no back pain 
(LR, 0.8) (the only 2 negative symptoms other than vaginal 
symptoms), a negative dipstick result (LR, 0.3), and no other 
positive symptoms, her posttest probability of disease is still 
18%, which is considerably higher than the prevalence of 
asymptomatic bacteriuria in the population (5%). Although 
we do not address the optimum treatment of such patients, we 
believe that the relatively high probability of UTI (-20%) war¬ 
rants a urine culture (Figure 51-1), an approach that has been 
supported by others. 10 Clinicians may also want to consider 
performing a pelvic examination, especially in patients at high 
risk for sexually transmitted disease or if the urine culture 
result is negative and symptoms persist. As noted, it is theoret¬ 
ically possible to rule out UTI in women who present with 
vaginal discharge, in which the lowest possible posttest proba¬ 
bility of disease is 6% (if they also have no dysuria, no back 
pain, a negative dipstick result, and no other positive symp¬ 
toms). We recommend that clinicians consider obtaining a 
urine culture in patients with at least 1 urinary symptom and 
vaginal discharge because the posttest probability of disease 
will only rarely reach this lowest possible 6%. 

If the medical history and physical examination are nei¬ 
ther strongly positive nor negative, a positive dipstick 
result still results in a high posttest probability of disease 
(approximately 80%), and empirical therapy should again 
be considered without urine culture. In all of the scenarios 
in the algorithm, urine culture may be indicated, without 
regard to the posttest probabilities, if the patient has expe¬ 
rienced recurrent infection and antibiotic resistance is 
suspected. 

Older guidelines for the evaluation of patients with sus¬ 
pected UTI recommend urine culture in all patients, even in 
those found to have a high probability of UTI after the medi¬ 
cal history and physical examination. 29 - 44 More recent reviews 
and management strategies suggest that a diagnosis of UTI 
can be established in women who present with typical symp¬ 
toms and are found to have a positive dipstick or urinalysis 
result (without obtaining a urine culture). 10,45 ' 48 

Unlike these treatment recommendations, our proposed 
algorithm (Figure 51-1) suggests that, in selected patients 
with mostly positive symptoms, the probability of UTI is so 
high (-90%) that empirical treatment may be considered 
without dipstick testing or urinalysis. A similar strategy was 
recently evaluated in a randomized trial comparing manage¬ 
ment via telephone with office evaluation in 72 women with 
suspected UTI. 49 The investigators found no difference in 
symptom scores or patient satisfaction with the 2 strategies. 
Previous studies examining the effect of symptom-based 
treatment of patients with suspected UTI (after a telephone 
call or office visit to a health care provider) have shown that 
empirical therapy decreases costs without increasing adverse 
outcomes. 50,51 However, the main purposes of the current 
algorithm are to define the posttest probabilities of disease 
from specific clinical scenarios and to allow clinicians to make 
informed testing and treatment decisions based on their clini¬ 
cal judgment. Further research is needed to determine clinical 


outcomes, costs, and patient satisfaction associated with dif¬ 
ferent testing and treatment strategies for treating patients 
who present with specific constellations of symptoms of UTI. 


CLINICAL SCENARIOS—RESOLUTIONS 


In the first case, the woman has 2 symptoms of UTI (dys¬ 
uria and frequency), has no vaginal discharge, and 
believes that her current symptoms are similar to those of 
previous episodes. These features all increase her proba¬ 
bility of UTI, which is greater than 90%. Her sexual his¬ 
tory does not suggest that she is at high risk for a sexually 
transmitted disease. With the algorithmic approach, the 
patient should be asked about risk factors for complicated 
infection, as well as symptoms classically associated with 
pyelonephritis (fever, back pain, nausea, vomiting). As has 
been shown, telephone evaluation and treatment of simi¬ 
lar patients may be an appropriate strategy. 49,50 In this 
patient, a positive dipstick urinalysis result would further 
increase the probability of UTI, whereas a negative result 
would not rule out infection. 

In the second case, the woman has 2 symptoms of UTI 
(dysuria and frequency), as well as vaginal discharge 
(which decreases the probability of UTI and increases the 
probability of vaginal infection). A pelvic examination 
does not suggest a specific diagnosis and the dipstick uri¬ 
nalysis result is negative. The posttest probability of UTI is 
approximately 20%, illustrating that even a negative phys¬ 
ical examination result and dipstick test result are insuffi¬ 
cient to rule out UTI in a patient with 1 or more 
symptoms. A urine culture will help determine the need 
for treatment, and cervical cultures are indicated to rule 
out chlamydia and gonorrhea and help determine the 
cause of her symptoms. 


THE BOTTOM LINE 

In a woman who presents with 1 or more symptoms of UTI, 
the probability of infection is high (approximately 50%). 
Four symptoms (dysuria, frequency, hematuria, and back 
pain) and 1 sign (costovertebral angle tenderness) increase 
the probability of UTI when present. Combinations of symp¬ 
toms can substantially increase the likelihood of UTI, effec¬ 
tively ruling in the disease according to the medical history 
alone. Patients with recurrent infection may be able to accu¬ 
rately self-diagnose UTI. 

In contrast, the medical history and physical examination 
cannot reliably rule out UTI in women who present with uri¬ 
nary symptoms. Although 4 symptoms (absence of dysuria, 
absence of back pain, and a history of vaginal discharge or 
vaginal irritation) and 1 sign (vaginal discharge) decrease the 
probability of UTI, even combinations of symptoms, signs, 
and a negative dipstick result rarely decrease the probability 
of UTI below 20%. A urine culture and pelvic examination 
should be considered in patients who present with some 
symptoms of UTI but with mostly negative history responses 
and physical examination findings. 
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Dipstick urinalysis, which is a simple and inexpensive test, 
is moderately powerful and should be considered in women 
with appropriate urinary tract symptoms. If the dipstick 
result is positive, the probability of UTI is high, especially 
when combined with other positive findings from the medi¬ 
cal history and physical examination. If the dipstick result is 
negative, the probability of disease is still relatively high 
(23%) and a urine culture should be considered to rule out 
infection. 

Care should be taken to identify women with vaginal dis¬ 
charge or vaginal symptoms. If either is present, a pelvic 
examination and cervical culture are indicated to rule out 
infection caused by chlamydia 52 or gonorrhea, as well as 
other vaginal infections that require definitive therapy. Simi¬ 
larly, in women with back pain, fever, or significant malaise, 
an office examination, combined with dipstick urinalysis and 
urine culture, may aid in the diagnosis of pyelonephritis, 
although the accuracy of individual tests for establishing 
upper UTI is not known. 

Knowledge of the LRs for specific symptoms, signs, and 
diagnostic tests used to evaluate patients with suspected UTI 
may improve the ability of clinicians to more accurately pre¬ 
dict the probability of infection in individual patients. It 
seems reasonable to offer empirical treatment when the 
probability of infection is high and to pursue additional diag¬ 
nostic testing (eg, urine culture, pelvic examination, and cer¬ 
vical cultures) when the probability of UTI is low or 
intermediate. However, the actual cost-effectiveness of spe¬ 
cific testing and treatment strategies is not clearly estab¬ 
lished, and prospective studies examining clinical benefits, 
adverse effects, costs, and patient satisfaction with specific 
approaches are needed. 
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CLINICAL SCENARIO 


A 20-year-old healthy woman calls her student health 
clinic to report 1 day of dysuria with increased frequency. 
There has been no vaginal discharge or irritation, fever, 
back pain, nausea, or vomiting. She is sexually active with 
1 partner and they use condoms. The symptoms seem 
similar to those of a previous urinary tract infection 
(UTI). What is the patient’s probability of UTI based 
solely on the information from the medical history? 
Should she come in for a physical examination or a dip¬ 
stick urinalysis to provide additional evidence that she has 
a UTI? Can UTI be ruled out without a urine culture? 


UPDATED SUMMARY ON URINARY TRACT 
INFECTION IN ADULT WOMEN 

Original Review 

Bent S, Nallamothu BK, Simel DL, Fihn SD, Saint S. Does this 
woman have an acute uncomplicated urinary tract infection? 
JAMA. 2002;287(20):2701-2710. 

UPDATED LITERATURE SEARCH 

We searched MEDLINE from September 2001 through July 
2004, using the same strategy as in our original publication. 
Search terms included “urinary tract infection,” “diagnostic 
tests,” “physical examination,” and “sensitivity and specific¬ 
ity.” We also manually reviewed the bibliographies of all iden¬ 
tified articles and contacted experts in the field to identify 
other relevant articles. The search identified 35 titles that 
were reviewed by 2 investigators. Four articles were deemed 
potentially relevant, although none addressed the clinical 
examination. 

NEW FINDINGS 

• There are no new data from high-quality studies that 
change our previous estimates of the diagnostic accuracy of 
signs and symptoms for predicting UTI. 


• One new study reports that the probability of UTI after a 
negative dipstick result is approximately 20%, agreeing 
with our original estimate and indicating that it is difficult 
to rule out UTI with the clinical examination and dipstick 
urinalysis testing. 

• “Telephone diagnosis” may be a reasonable option for 
patients without risk factors for complicated UTI who 
call with dysuria or urinary frequency, although current 
studies lack power to determine whether telephone diag¬ 
nosis leads to an increase in pyelonephritis or sexually 
transmitted diseases. 

Details of the Update 

In women who present with 1 or more urinary tract symp¬ 
toms compatible with UTI, the pretest probability is esti¬ 
mated to be 48%. A study of the diagnostic accuracy of 
dipstick urinalysis assessed 277 consecutive women present¬ 
ing with symptoms suggestive of UTI. 1 In this study, all 
women received a urine culture, and the culture result was 
positive in 168 patients (incidence of UTI, 168/277 = 61%). 
For a positive dipstick urinalysis result, the LR is 1.5 (95% 
confidence interval [Cl], 1.3-1.8), whereas the likelihood 
ratio (LR) for a normal dipstick result (negative LR) was 0.19 
(95% Cl, 0.10-0.36). In this study, a positive dipstick result 
was a less powerful predictor of UTI than the summary esti¬ 
mate from a previously published systematic review of 51 
studies. 2 However, these findings agree with our original 
assessment that a normal dipstick urinalysis result does not 
lower the probability of UTI enough to rule out infection. 

Two articles 3 ' 4 located by our search were previously dis¬ 
cussed in a letter to the editor following the original Rational 
Clinical Examination article. Both articles involved prospec¬ 
tive recruitment of patients with symptoms suggestive of 
UTI, and both examined the diagnostic accuracy of signs and 
symptoms for predicting UTI. However, neither article used 
an acceptable gold standard in all patients. One article 3 tested 
all patients with a dipstick urinalysis and sent cultures only 
when the dipstick result was positive. The other article 4 did 
not state how the decision to perform cultures for patients 
was made, and only 63% of patients received a urine culture. 
Because both of these studies were subject to verification bias 
(gold standard applied only when a preliminary test result is 
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positive), we chose not to add the results to the summary 
estimates generated in our original review that came from 5 
level 1 studies (prospective, independent blind comparison 
of signs or symptoms to a gold standard among a large num¬ 
ber [>50] of consecutive patients suspected of having a UTI). 
Because the reference standard test was not applied to all 
patients, the prevalence of UTI among women presenting 
with symptoms in these studies (25% 3 and 36% 4 ) may under¬ 
estimate the true prevalence. 

Two studies 5,6 examined the use of telephone diagnosis and 
management for selected patients who present with symptoms 
of UTI but who are at low risk for complicated UTI (ie, no dia¬ 
betes, pregnancy, immunosuppression, or known renal dis¬ 
ease). These studies evaluated the treatment of patients after a 
presumed diagnosis was made according to the symptoms 
elicited from the patient during a telephone call. The first 
study 5 was a population-based, before-and-after study, with 
concurrent control groups of women calling to report their 
symptoms of dysuria or urinary frequency. Among 3889 
patients with presumed acute, uncomplicated UTI, use of the 
telephone guideline decreased office visits by 33% and led to a 
nearly 3-fold increase in the use of a guideline-recommended 
antibiotic. The authors found a nonsignificant increase in 
return visits for evaluation of a possible sexually transmitted 
disease after guideline implementation but cautioned that 
their study was not adequately powered to detect small 
increases in outcomes such as pyelonephritis or sexually trans¬ 
mitted diseases. A second study 6 randomly assigned a similar 
population of 72 women without risk factors for complicated 
UTI to either a telephone management protocol or an office 
visit. All women received a urine culture and all were con¬ 
tacted at 3 and 7 days to determine symptom severity. The 
authors found that 64% of enrolled patients had positive urine 
culture results. All patients were treated with antibiotics in the 
telephone group, whereas 32 of 36 patients were treated in the 
clinic-visit group. There was no difference in the change in 
symptom scores or the rate of treatment failure between 
groups, likely because almost all patients received antibiotic 
treatment. These authors also observed that the sample size 
was inadequate to detect differences in adverse events between 
groups. 

IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

We revised Table 51-2 so that it now shows a similar number 
of significant digits for LRs < 1 and 1 to 10. We identified no 
data to suggest changes in our original estimates of the diag¬ 
nostic accuracy for signs, symptoms, or dipstick urinalysis 
for predicting UTI in women. Although we believe that the 
best estimate of the prevalence of UTI among patients with 
suggestive symptoms comes from the level 1 studies in our 
original report (48%; 95% Cl, 41%-55%), we believe that 
there may be significant variability in this estimate according 
to differences in clinical setting, patient characteristics, or 
geographic location. 


CHANGES IN THE REFERENCE STANDARD 

The reference standard remains an appropriately obtained 
urine specimen for culture. 

RESULTS OF LITERATURE REVIEW 

Individual findings do not have great diagnostic power to 
change the high pretest probability of UTI in women 
(-50%). One study from our original review suggests that 
multiplying LRs from individual symptoms generates a mul¬ 
tivariate LR that is a reasonable estimate of the diagnostic 
accuracy of combined symptoms. 7 

EVIDENCE FROM GUIDELINES 

The US Preventive Services Task Force 8 recommends against 
screening for asymptomatic bacteriuria other than during 
pregnancy. No US federal or Canadian guidelines address the 
evaluation of women primary care patients who have symp¬ 
toms compatible with UTIs. 

Many experts previously recommended urine culture in all 
patients with suspected UTI, even in those found to have a 
high probability of UTI after the medical history and physical 
examination. 7,9 More recent reviews and management strate¬ 
gies suggest that a diagnosis of UTI can be established in 
women who present with typical symptoms and are found to 
have a positive dipstick or urinalysis result (without obtain¬ 
ing urine culture). 10 ' 14 


CLINICAL SCENARIO—RESOLUTION 


Although the pretest probability of UTI in the average 
patient who presents with symptoms is approximately 
50%, this patient also has dysuria, frequency, and no vagi¬ 
nal discharge or irritation. Her posttest probability of UTI 
is greater than 90%. The history-taking should include 
questions about risk factors for complicated UTI (diabe¬ 
tes, immunosuppression, pregnancy, known renal dis¬ 
ease). In patients without these risk factors who have a 
high probability of UTI, 2 studies 5 ' 6 suggest that telephone 
diagnosis and management may be appropriate, although 
it is not clear whether such strategies increase the risk of 
adverse events because of untreated pyelonephritis or sex¬ 
ually transmitted disease. 

In a patient who presents with an isolated symptom of 
UTI (such as dysuria), an office visit with a negative dip¬ 
stick result decreases the probability of UTI to approxi¬ 
mately 20%. Because many clinicians will think that this 
probability is still too high, they might choose a strategy 
of urine culture or close clinical follow-up and consider 
performing a pelvic examination to assess for other condi¬ 
tions. All patients who have risk factors for complicated 
UTI, as well as a report of back pain, fever, or vaginal dis¬ 
charge, require further evaluation. 
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URINARY TRACT INFECTION, WOMEN— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

The pretest probability of UTI among women with compat¬ 
ible symptoms is 48% (95% Cl, 41%-55%). 

POPULATION FOR WHOM URINARY TRACT 
INFECTION SHOULD BE CONSIDERED 

Urinary tract infection should be considered in all adult 
women who present with 1 or more suggestive symptoms 
(frequency, dysuria, hematuria, fever, flank, or abdominal 
pain). Women with complicated UTI from a functional or 
anatomical abnormality of the urinary tract may present 
differently. 

DETECTING URINARY TRACT 
INFECTION IN ADULT WOMEN 

Combinations of symptoms (Table 51-4) can substantially 
increase the probability of UTI, effectively ruling in the diag¬ 
nosis according to the medical history alone. In contrast, the 
history and physical examination cannot reliably exclude the 
diagnosis of UTI in women who present with urinary symp¬ 
toms. A urine culture and pelvic examination should be 
considered in patients who present with some symptoms of 
UTI but otherwise a mostly negative history for UTI, a nor¬ 
mal physical examination result, and a normal dipstick uri¬ 
nalysis result. 


Table 51-4 Univariate Findings and Multivariate Approach for 

Diagnosing Urinary Tract Infection in Adult Women 



Univariate Findings 



LR (95% Cl) a 


Present 

Absent 

Dysuria 

1.5(1.2-2.0) 

0.5 (0.3-0.7) 

Frequency 

1.8 (1.1-3.0) 

0.5 (0.4-1.0) 

Vaginal discharge 

0.3 (0.1-0.9) 

3.1 (1.0-9.3) 

Vaginal irritation 

0.2 (0.1-0.9) 

2.7 (0.9-8.5) 


Abnormal 

Normal 

Dipstick result* 1 

4.2 

0.3 


Multivariate Approach 

Multiply the above individual LRs for combinations of findings (eg, dysuria 
present and vaginal discharge absent yields a combined LR = 4.7; dys¬ 
uria absent and vaginal discharge present yields a combined LR = 0.15). 


Abbreviations: Cl, confidence interval; LR, likelihood ratio. 

a LRs < 1 are rounded off to make computation easier when combining findings. 

"The dipstick values were selected from visual inspection of a summary receiver operat¬ 
ing characteristic curve to maximize the accuracy, so CIs could not be determined. 2 

REFERENCE STANDARD TESTS 

The reference standard remains an appropriately obtained 
urine specimen for culture. 
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CLINICAL SCENARIOS 


CHAPTER 


What Is Causing 
This Patient’s 

Vaginal Symptoms? 

Matthew R. Anderson, MD, MS 
Kathleen Klink, MD 
Andreas Cohrssen, MD 


CASE 1 An otherwise healthy 33-year-old woman pre¬ 
sents with a complaint of foul-smelling vaginal discharge. 
She is sexually active with 1 male partner. This is the first 
time she has had this symptom and is worried that it may 
represent a serious health problem. What diagnostic 
maneuvers—medical history, physical examination, and 
office laboratory tests—will allow the clinician to deter¬ 
mine the cause of her symptoms? 

CASE 2 A 35-year-old woman with 2 sexual partners in 
the last year complains of an itchy, smelly discharge. The 
pelvic examination reveals no vulvar or vaginal inflamma¬ 
tion; a foamy, thin discharge with a pH of 5.0; and some 
bleeding at the cervix. The wet preparation reveals 2 clue 
cells per high-power field and, after thorough review of 
the slide, no motile organisms are seen. What is the 
chance that this patient has vaginal candidiasis, bacterial 
vaginosis, or vaginal trichomoniasis? 


WHY IS THE CLINICAL 
EXAMINATION IMPORTANT? 


Vaginal complaints are common in primary care. They are 
the most common reason for gynecologic consultation and 
account for approximately 10 million office visits annually. 1 
Current recommendations for the diagnosis of vaginal com¬ 
plaints in premenopausal women involve a vaginal examina¬ 
tion and microscopy. The evaluation has traditionally been 
oriented toward the detection of vaginal candidiasis, bacte¬ 
rial vaginosis, and trichomoniasis, which are the 3 most 
common causes of vaginitis in this age group. 2 ' 4 

Prevalence of these 3 conditions will vary, depending on 
the clinical setting. National figures show that 40% to 50% of 
patients with vaginal symptoms have bacterial vaginosis; 
20% to 25% have vaginal candidiasis; and 15% to 20% have 
trichomoniasis. 5 In the studies surveyed for this review, 
which involved symptomatic women presenting in primary 
care, the prevalence of vaginal candidiasis ranged from 17% 
to 39% 6,7 ; bacterial vaginosis, 22% to 50% 8,9 ; and trichomoni¬ 
asis, 4% to 35%. 10,11 The number of undiagnosed patients 
ranged from 7% to 72%. 6,12 

Women who present with vaginal complaints often receive 
tests for gonorrhea or chlamydia, though the association 
between gonorrhea, chlamydia, and vaginal discharge is not 
confirmed. 13,14 It would be prudent, however, to test for gonor¬ 
rhea and chlamydia in sexually active patients who are younger 
than 25 years and in all patients who have fever, lower abdom¬ 
inal pain, a symptomatic sexual partner, a new sexual partner, 
or more than 1 sexual partner. 14 Additional less common 
causes of vulvovaginal symptoms are infection with herpes 
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simplex 15 ; allergic reactions to chemical irritants, latex, 16 or 
semen 17 ; mechanical irritation caused by lack of lubrication; 
and atrophic vaginitis in postmenopausal women. 18 

About 30% of women with vaginal complaints go without a 
diagnosis even after a complete evaluation using techniques 
more comprehensive than those usually available. 8 ' 19,20 Perhaps 
this explains why many clinicians appear to treat patients with¬ 
out performing a pH examination of the discharge or micros¬ 
copy. 21 In actual clinical practice, diagnoses of vaginal complaints 
do not show good agreement with diagnoses based on cul¬ 
tures. 22 These concerns led us to evaluate the role of the clinical 
examination in the diagnosis of vaginal complaints. 

Point-of-care testing for vaginal complaints is a new and 
rapidly evolving field. A number of commercially available 
office kits use a vaginal discharge sample to diagnose bacte¬ 
rial vaginosis, 23 trichomoniasis, 23 and vaginal candidiasis. 6 A 
systematic review of these diagnostic kits is, however, beyond 
the scope of this article. 

How to Elicit Symptoms and Signs 

Elicitation of Symptoms 

Patients who have vaginitis generally complain of some com¬ 
bination of discharge, odor, irritation, or itch. Discharges are 
characterized by color (clear, white, green, gray, yellow), con¬ 
sistency (thin, thick, curdlike), and amount (more or less 
than usual). We could locate no scale that allows the patient 
to quantify precisely the amount of her discharge. 

Signs 

Patients may have irritation manifested as erythema, excoria¬ 
tion, or discharge on the perineum or introitus. The dis¬ 
charge is sampled during a speculum examination with a 


swab from the posterior fornix or picked up on the specu¬ 
lum. Some clinicians ask patients to provide a self-collected 
sample of their vaginal discharge. 24 

The sample can be tested for pH with phenaphthazine 
paper. When gel is used on the speculum, care must be taken 
not to contaminate the sample because the pH may become 
altered. In addition, semen, douches, and intravaginal medi¬ 
cation can all make the vaginal pH more basic. 

Characteristic findings on the wet mount are shown in 
Figure 52-1. Microscopy is performed by placing a drop of 
vaginal fluid on 2 slides. A drop of saline is mixed with the 
discharge on one slide, whereas a drop of 10% potassium 
hydroxide is placed on the second slide. The examiner then 
“whiffs” the potassium hydroxide slide to determine the 
presence of the characteristic fishy (amine) odor of bacte¬ 
rial vaginosis. The potassium hydroxide slide is set aside or 
put on a warmer. The other vaginal sample is examined 
under X400 power for trichomonads, clue cells, yeasts, pres¬ 
ence or absence of lactobacilli (long rods 25 ), and the pres¬ 
ence of leukocytes. Clue cells are epithelial cells with a finely 
granulated cytoplasm and indistinct borders, 26 which appear 
to have been coated with sand. The potassium hydroxide 
slide is examined for yeast. Yeast may be seen on the saline 
preparation, obviating the need to perform the potassium 
hydroxide microscopic examination. 

Two excellent resources exist for learning how to perform 
the wet mount examination and whiff test. The Seattle 
STD/HIV Prevention Training Center has produced a short, 
downloadable instructional video. 27 The video illustrates 
the technique of the wet mount examination and includes 
clips of common findings such as yeast, clue cells, and 
motile trichomonads. For those more comfortable with 


|~a] Normal vaginal epithelial cells 


|~b] Clue cells with coccobaccilli 





15] Trichomonads 


/ 


Flagellum 


Figure 52-1 Microscopic Examination of 
Vaginal Samples 

A, Normal saline wet mount showing a clump of 3 nor¬ 
mal vaginal epithelial cells (original magnification, 
x600). Reproduced with permission from William L. 
Thelmo, MD. B, Normal saline wet mount showing 2 
clue cells (original magnification, x400). Inset, Gram 
stain demonstrating how coccobaccilli on the surface of 
vaginal epithelial cells create the characteristic granular 
appearance and indistinct borders of clue cells (original 
magnification, xIOOO). Reproduced with permission 
from Lorna Rabe, Magee-Womens Research Institute, 
Pittsburgh, Pennsylvania. C, Normal saline wet mount 
showing numerous Candida hyphae and buds (original 
magnification, x400). Reproduced with permission 
from Lorna Rabe. D, Normal saline wet mount showing 
4 trichomonads. Trichomonads can often be identified 
easily because of their characteristic jerky motility (orig¬ 
inal magnification, x600). Reproduced with permission 
from the Medical Laboratory Evaluation proficiency 
testing program of the American College of Physicians 
Services Inc. 
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paper materials, the Association of Professors of Gynecol¬ 
ogy and Obstetrics’ pamphlet on the diagnosis of vaginitis 28 
contains photographs of the methods and findings of the 
wet mount examination. 

Under the Clinical Laboratory Improvement Act, the wet 
mount examination is considered a moderately complex test, 
and the practitioner’s laboratory must obtain a Certificate of 
Provider-Performed Microscopy Procedures from the local 
state health department. 29 

METHODS 

Search Strategy 

We undertook a MEDLINE review of the literature from 
1966 through April 2003, combining the term “diagnosis” 
with the terms “vaginitis,” “vaginal discharge,” “candidia¬ 
sis,” “bacterial vaginosis,” and “trichomoniasis.” We reviewed 
more than 500 abstracts and obtained a copy of articles 
(>100) that appeared likely to meet our review criteria. 
We also examined all articles mentioned in the most 
recent American College of Obstetricians and Gynecolo¬ 
gists Technical Bulletin . 3 Each article was reviewed by at 
least 1 author and in ambiguous cases by all 3. Included 
articles and review articles were culled for further refer¬ 
ences. We attempted to contact the authors of all articles 
included in this review and to request additional refer¬ 
ences. We received replies from 7 authors, but no addi¬ 
tional references were produced. 

Inclusion and Exclusion Criteria 

Articles were included if they (1) involved original research 
performed on symptomatic patients in a primary care setting 
(including sexually transmitted disease clinics), (2) com¬ 
pared a diagnostic test with a recognized criterion standard, 

(3) allowed the calculation of sensitivity or specificity, and 

(4) discussed tests that would provide diagnostic informa¬ 
tion during the course of the office visit. We excluded articles 
that reported on women treated in specialty or referral set¬ 
tings, those with recurrent or treatment-refractory vaginitis, 
or asymptomatic patients (for example, women treated for 
routine pelvic examination). 

Evaluation of Methods 

Eighteen articles met our inclusion and exclusion criteria and 
are listed in Table 52- 1. 6 ' 12 - 23 - 30 ' 39 We graded the articles’ diag¬ 
nostic methodologic quality on a 3-point scale (highest to 
lowest quality). The grading and criteria are listed in Box 52-1. 
A different quality score from other Rational Clinical Exami¬ 
nation articles (see Table 1-7) was required, because the focus 
of our study involved 3 different types of vaginitis, each of 
which have different laboratory criterion standards. 

Evaluation of Criterion Standards 

The diagnostic criterion standard for vaginal candidiasis 
is a positive culture result or identification of yeast by 


microscopy. Because many asymptomatic women have 
vaginal yeast colonization, it is not clear whether a posi¬ 
tive culture result or microscopy alone confirms Candida 
as the cause of symptoms, yet this is the current diagnostic 
criterion standard. We accepted studies that used micros¬ 
copy only as a criterion standard but considered these of 
lower quality. 

We used the Amsel criteria 40 as the criterion standard for 
the diagnosis of bacterial vaginosis. Bacterial vaginosis is 
diagnosed when 3 of 4 findings are present: (1) a thin, homo¬ 
geneous vaginal discharge; (2) clue cells; (3) positive whiff 
test; and (4) vaginal pH level higher than 4.5. 40 Several arti¬ 
cles used either Gram stain or a positive culture for Gardner- 
ella vaginalis as criterion standards, which we also accepted, 
although we did not consider this optimal. 

The criterion standard applied to the diagnosis of tricho¬ 
moniasis is a positive culture result. Immunofluorescence 
and polymerase chain reaction are probably equivalent to 
culture. We accepted studies that included identification of 
trichomonads by direct microscopy or Papanicolaou tests, 
although these were considered of lesser quality. 

Data Extraction 

Sensitivity, specificity, and likelihood ratios (LRs) were 
either taken directly from the article or calculated from 
data provided in the article. All of the authors extracted 
the data and computed sensitivity and specificity from 
each article independently. Disagreements were resolved 
by consensus. All data and any calculations were sent to 
the primary authors for their review. One author of an 
article 12 we included provided additional data that have 
been incorporated into this review. A fourth person inde¬ 
pendently verified all data points. The absence of standard 
definitions for a variety of symptoms and signs, along 
with ambiguous phrasing of terms, made it impossible to 
combine results across studies. 

Statistical Analysis 

Statistical analysis was performed using SPSS (version 10.0; 
SPSS Inc, Chicago, Illinois) and Stata (version 8; StataCorp, 
College Station, Texas) statistical software. When there were 
no patients in one of the 4 cells of a 2 x 2 table (true positive, 
false positive, false negative, true negative), the value 0.5 was 
added to each cell of the 2x2 table for calculating the LRs. 

Results 

Precision 

Precision refers to the degree to which independent observ¬ 
ers will find the same result when applying the same test. 
No study reported the precision of the tests reviewed in this 
article. 

Accuracy of Symptoms 

Tables 52-2 and 52-3 present the sensitivity, specificity, and 
LRs for all symptoms. The reviewed articles tested the follow¬ 
ing symptoms for their usefulness in the diagnosis of vaginal 
complaints: (1) characteristics of the discharge (quantity, 
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Table 52-1 Included Studies of Diagnostic Strategies for Vaginal Symptoms 


Source, y 

No. of 
Patients 

Setting 

Symptoms 

Vaginal 
Candidiasis, 
No. (%) 

Bacterial 
Vaginosis, 
No. (%) 

Vaginal 

Trichomoniasis, 
No. (%) 

Quality 

Score" 

Criterion Standard 

Abbott, 12 1995" 

71 

Urban ED or walk-in clinic; 
Denver, CO 

Vaginal itching, 
discharge, or pain 

23 (32) 

29(41) 

5(7) 

2 

Candidiasis: culture only 

Abu Shaqra, 30 
2001 

301 

Private gynecologists; Zarka, 
Jordan 

Vaginal discharge 

78 (26) 

90 (30) 

9(3) 

2 

Bacterial vaginosis: Nugent 
criteria" 

Bennett et al, 11 
1989 

157 

Urban ED; Kansas City, MO 

Vaginal discharge 

NA 

NA 

55 (35) 

2 

Trichomoniasis: culture, 
microscopy, immunofluores¬ 
cence 

Bleker et al, 31 
1989" 

97 

Urban general hospital gyne¬ 
cology clinic; Amsterdam, The 
Netherlands 

Vaginal discharge 

24 (25) 

37 (38) 

13(13) 

3 

Bacterial vaginosis: Spiegel 
criteria 6 ; trichomoniasis: 
microscopy; candidiasis: 
microscopy 

Borchardt et 
al, 32 1992 

69 

3 Clinics (1 STD clinic); San 
Jose, Costa Rica 

Not indicated 

NA 

NA 

10(15) 

2 

Trichomoniasis: culture 

Briselden and 
Hillier, 23 1994 

176 

STD clinic; Seattle, WA 

Genital 

complaints 

NA 

79 (45) 

19(11) 

2 

Bacterial vaginosis: clinical 
criteria; trichomoniasis: cul¬ 
ture, microscopy 

Bro, 7 1989 

361 

General practices (n = 29); 
Aarhus, Denmark 

Increased vaginal 
discharge, mal- 
odor, or pruritus 

141 (39) 

NA 

NA 

2 

Candidiasis: culture, micros¬ 
copy 

Carlson et al, 6 
2000' 

124 

Gynecology outpatient clinic; 
Helsinki, Finland 

Suspected 

vaginitis 

21 (17) 

NA 

NA 

2 

Candidiasis: culture 

Chandeying et 
al, 10 1998 

240 

University gynecology outpa¬ 
tient clinic; Songlkla, Thailand 

Vaginal discharge 

53 (22) 

91 (38) 

10(4) 

3 

Bacterial vaginosis: Amsel 
criteria 8 ; candidiasis: 
microscopy; trichomoniasis: 
microscopy 

Eckert et al, 33 
1998 

774 

STD clinic; Washington state 

“A new problem” 

186 (24) 

294 (38) 

116(15) 

2 

Candidiasis: culture 

Fule et al, 34 

1990 

200 

Hospital gynecology clinic; 
Solapur, India 

Abnormal vaginal 
discharge 

NA 

34 (17) 

NA 

2 

Bacterial vaginosis: culture 
and exclusion of other 
causes 

Holst et al, 35 
1987 

101 

Community health center; 
Lund, Sweden 

Genital malodor 
or abnormal vagi¬ 
nal discharge 

23 (23) 

34 (34) 

9(9) 

2 

Bacterial vaginosis: Amsel 
criteria 9 

Krieger et al, 36 
1988 

600 

STD clinic; Seattle, WA 

“New problems" 

NA 

NA 

90 (15) 

2 

Trichomoniasis: culture 

Livengood et 
al, 37 1990 

67 

2 Hospital gynecology clinics 

NA 

NA 

67 (100) 

NA 

2 

Bacterial vaginosis: Amsel 
criteria 9 

O’Dowd and 
West, 9 1987" 

162 

Department of General Prac¬ 
tice; Nottingham, England 

Vaginal 

symptoms 

NA 

81 (50) 

NA 

3 

Bacterial vaginosis: culture 
only 

Ryu et al, 38 

1999 

177 

University obstetrics/gynecol¬ 
ogy clinic; Seoul, Korea 

Vaginal discharge 

NA 

NA 

18(10) 

2 

Trichomoniasis: culture 

Schaaf et al, 8 
1990' 

123 

County hospital family plan¬ 
ning clinic or community- 
based women's health 
center; San Francisco, CA 

Evaluation for 
vaginitis 

32 (26) 

27 (22) 

9(7) 

2 

Bacterial vaginosis: Amsel 
criteria 9 ; trichomoniasis: cul¬ 
ture; candidiasis: culture 

Wathne et al, 39 
1994' 

101 

Swedish community health 
center; Lund, Sweden 

Vaginal discharge 
or malodor 

23 (23) 

34 (34) 

9(9) 

2 

Bacterial vaginosis: Amsel 
criteria 9 ; trichomoniasis: cul¬ 
ture; candidiasis: culture 


Abbreviations: ED, emergency department; NA, information not reported; STD, sexually transmitted disease. 

“See Box 52-1 for criteria for quality scoring. 

“Additional unpublished data from this study were included in this review. 

“Determined using criteria from Nugent et al. 25 
“Twenty-two patients were not diagnosed. 

“Determined using criteria from Spiegel et al. 5 “ 

'Seventy-four patients were not diagnosed. 

“Determined using criteria from Amsel et al. 40 
"Nineteen patients were not diagnosed. 

'Fifty-one patients were not diagnosed. Women with herpes or urinary tract infections were excluded. 

Data appear to be same as in Holst et al. 35 Data on bacterial vaginosis were reported differently in this article and have been excluded from our analysis. 
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color, consistency), (2) presence or absence of itching, (3) irri¬ 
tative symptoms (redness, pain/burning, swelling), (4) odor 
(present, fishy, or foul), (5) patient’s self-diagnosis, (6) urinary 
tract symptoms, (7) bleeding, and (8) dyspareunia. 

Discharge Characteristics 

Patients’ descriptions of their discharge do not appear useful 
diagnostically with 1 exception. A “cheesy” discharge increases 
the likelihood of candidiasis (LR, 2.4; 95% confidence inter¬ 
val [Cl], 1.4-4.2), whereas a watery discharge makes it less 
likely (LR, 0.12; 95% Cl, 0.02-0.82). 

Itching 

Several studies confirm that 70% to 90% of patients with 
vaginal candidiasis complain of itching (range of LRs, 1.4 to 
3.3). Similarly, these studies show LRs ranging from 0.18 to 
0.79 for women who do not have itching; thus, lack of itching 
decreases the likelihood of candidal infection. Itching symp¬ 
toms are not useful for assessing the likelihood of bacterial 
vaginosis or trichomoniasis. 

Irritative Symptoms 

The limited data suggest that irritative symptoms are slightly 
useful in the diagnosis of candidiasis. Erythema increases the 
likelihood of candidiasis slightly (LR, 2.0; 95% Cl, 1.5-2.8); its 
absence decreases its likelihood (LR, 0.84; 95% Cl, 0.76-0.92). 

Odor 

The presence of an odor perceived by the patient decreases 
the likelihood of candidiasis (range of LRs, 0.35 to 0.48), 
whereas the absence of an odor increases its likelihood (range 


Box 52-1 Criteria for Quality Scoring 
LEVEL 1 

Explicit inclusion and exclusion criteria. 

More than 95% of patients received specified diagnostic 
evaluation including criterion standard. 

More than 2 persons performed the diagnostic test, and 
a measure was made of interobserver variability. 

Sensible normal range defined for continuous variables 
(when applicable) and criterion standards were used 
(Amsel 40 criteria for bacterial vaginosis, culture for vaginal 
trichomoniasis, and culture for vaginal candidiasis). 

(No studies met all level 1 criteria.) 

LEVEL 2 

Level 2 studies failed 1 or more level 1 criteria or used the fol¬ 
lowing criterion standards: for bacterial vaginosis, Amsel 40 
modification, Spiegel, 50 Nugent, 25 culture and exclusion of 
other causes; for vaginal trichomoniasis, polymerase chain 
reaction, immunofluorescence; and for vaginal candidiasis, 
culture. 

(Fifteen studies met level 2 criteria.) 

LEVEL 3 

Level 3 studies failed 1 or more level 1 criteria or used the 
following criterion standards: for bacterial vaginosis, Gard- 
nerella culture; for vaginal trichomoniasis, microscopy or 
Papanicolaou test; and for vaginal candidiasis, microscopy. 
(Three studies met level 3 criteria.) 


Table 52-2 Accuracy of Symptoms for Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis 


Symptom 

Diagnosis 

No. of Patients 
With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Type of discharge described by patient 

Any 

VC 

32 a 

72 (NS) 

b 



8 


BV 

27 a 

59 (NS) 




8 


BV 

67 

91 




38 

Cheesy 

VC 

23 

65 

73 

2.4 (1.4-4.2) 

0.48 (0.27-0.86) 

12 

Increased 

VC 

186 

NS 




34 


BV 

34 

59 

67 

1.8(1.2-2.8) 

0.61 (0.40-0.95) 

36 

Watery 

VC 

23 

4 

63 

0.12(0.02-0.82) 

1.5 (1.2-1.9) 

12 

White 

VC 

32 a 

41 (NS) 




8 


VC 

186 

NS 




34 

Yellow 

VC 

32 a 

19 (NS) 




8 


VC 

186 

NS 




34 


BV 

27 a 

26 (NS) 




8 

Malodor or odor 

VC 

23 

26 

46 

0.48 (0.23-1.0) 

1.6 (1.1-2.4) 

12 


VC 

32 a 

16 (NS) 




8 


VC 

23 

21 

37 

0.35 (0.16-0.77) 

2.1 (1.5-3.0) 

40 


BV 

34 

97 

40 

1.6(1.3-2.0) 

0.07 (0.01-0.51) 

36 


BV 

67 

73 




38 


BV 

27 a 

41 (NS) 




8 


BV 

34 

53 




40 



( continued ) 
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Table 52-2 Accuracy of Symptoms for Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis ( Continued) 



Symptom 

Diagnosis 

No. of Patients 
With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Itching 

VC 

23 

87 

50 

1.7(1.3-2.4) 

0.26 (0.09-0.78) 

12 


VC 

140 

79 

58 

1.8(1.6-2.2) 

0.38 (0.27-0.53) 

7 


VC 

32“ 

69 (NS) 




8 


VC 

23 

91 

47 

1.7(1.4-2.2) 

0.18(0.05-0.70) 

40 


VC» 

186 

50 

64 

1.4 (1.2-1.7) 

0.78 (0.67-0.91) 

34 


BV 

34 

41 

37 

0.66 (0.42-1.0) 

1.6 (1.0-2.4) 

36 


BV 

27 a 

67 (NS) 




8 

Chief complaint 

VC 

186 

27 

92 

3.3 (2.4-4.8) 

0.79 (0.72-0.87) 

34 

Irritation 

BV 

67 

45 




38 


BV 

27 a 

48 (NS) 




8 

Pain or burning' 

VC 

32 a 

69 (NS) 




8 

Redness' 

VC 

186 

20 

88 



34 


VC 

186 

28 

86 

2.0(1.5-2.8) 

0.84 (0.76-0.92) 

34 

Swelling' 

VC 

186 

24 

92 

1.4 (1.2-1.7) 

0.78 (0.67-0.91) 

34 

Urinary tract 

Increased frequency 
of urination 

VC 

32 a 

16 (NS) 




8 

Dysuria 

VC 

32 a 

13 (NS) 




8 


BV 

27 a 

11 (NS) 




8 


BV 

34 

32 




40 

External dysuria 

VC 

186 

33 

85 

2.2(1.6-2.9) 

0.79 (0.71-0.88) 

34 

Other 

“Another” yeast infection 

VC 

23 

35 

90 

3.3(1.2-9.1) 

0.72 (0.53-1.0) 

12 

Abnormal bleeding 

BV 

67 

4 




38 


Abbreviations: BV, bacterial vaginosis; Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, reported by author to be not significantly associ¬ 
ated with diagnosis; VC, vaginal candidiasis. 

“Patient may have had more than 1 diagnosis. 

“Ellipses indicate data not reported. 

“Elicited by clinician. 


Table 52-3 Accuracy of Symptoms for the Diagnosis of Vaginal Trichomoniaisis 





Symptom 

No. of Patients With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Type of discharge described by patient 

Any 

8 a 

75 (NS) 

b 



8 


17 

65 

29 

0.90(0.63-1.3) 

1.2(0.62-2.5) 

39 

White 

8 a 

13 (NS) 




8 

Yellow 

8 a 

50 (NS) 




8 

Malodor or odor 

Any 

8 a 

50 (NS) 




8 

“Fishy” 

13 

46 

45 

0.84 (0.45-1.6) 

1.2(0.68-2.1) 

32 

Itching 

17 

35 

76 

1.5(0.74-3.0) 

0.85 (0.59-1.2) 

39 


8 a 

75 (NS) 




8 

Irritation 

8 a 

63 (NS) 




8 

Urinary tract 

Increased frequency of urination 

8 a 

38 (NS) 




8 

Dysuria 

8 a 

38 (NS) 




8 


17 

0 

97 

0.64(0.04-10) 

1.0(0.85-1.3) 

39 

Postcoital bleeding 

17 

0 

97 

0.9 (0.06-13) 

1.0 (0.75-1.4) 

39 

Dyspareunia 

17 

6 

96 

1.4(0.18-11) 

0.98 (0.87-1.1) 

39 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. 
“Patient may have had more than 1 diagnosis. 

“Ellipses indicate data not reported. 
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Table 52-4 Accuracy of Signs for the Diagnosis of Vaginal Candidiasis 






Sign 

No. of Patients With Diagnosis Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Type of discharge noted by clinician 

Any 

32" 

87 (NS) 

b 



8 

Yellow 

32" 

16 (NS) 




8 

White 

32" 

63 (NS) 




8 

Curdy 

140 

16 

97 

6.1 (2.5-14) 

0.86 (0.80-0.93) 

7 

Flocculent 

23 

43 

84 

27(1.3-5,5) 

0.67 (0.46-0.98) 

40 

Consistency of discharge 

Thick 

32" 

52 




8 

Curdy 

186 

18 

99 

15(6.4-36) 

0.83 (078-0.89) 

34 

Curdy 

53 

72 

100 

130 (19-960) 

0.28 (0.19-0.44) 

10 

Thin 

32“ 

48 




8 

Inflammation 

Any 

140 

46 

78 

2.1 (1.5-2.8) 

0.69 (0.58-0.82) 

7 

Perineal edema or erythema 

23 

57 

77 

2.5(1.3-4.6) 

0.56 (0.35-0.92) 

12 

Vulvar edema 

186 

17 

98 

7.8(4.2-15) 

0.85(079-0.91) 

34 

Erythema or edema 

23 

91 




40 

Vulvar erythema 

186 

54 

79 

2.5 (2.1-3.1) 

0.58 (0.49-0.68) 

34 

Vaginal erythema 

186 

18 

94 

2.9(1.9-4.5) 

0.88 (0.82-0.94) 

34 

Vulvar excoriations 

186 

4 

99 

8.4 (2.3-31) 

0.96 (0.93-0.99) 

34 

Vulvar fissures 

186 

17 

96 

4.6 (2.7-77) 

0.86 (0.80-0.92) 

34 

Vaginal wall 

32“ 

23 




8 

Vulvar 

53 

40 

95 

8.2(4.0-16) 

0.63(0.51-0.79) 

10 

Cervical mucopus 

186 

21 

72 

0.75(0.55-1.0) 

1.1 (1.0-1.2) 

34 

Odor noted by clinician 

Any 

32" 

6 




8 

“Fishy” 

24 

0 

28 

0.03 (0-0.47) 

2.9 (2.4-5.0) 

32 

Combined signs 

Curdy discharge or vulvar inflammation 

53 

81 

95 

17(8.8-32) 

0.20(0.11-0.35) 

10 

Curdy discharge in presence of itching 

53 

77 

100 

150(20-1000) 

0.23(0.14-0.37) 

10 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. 
"Patient may have had more than 1 diagnosis. 

"Ellipses indicate data not reported. 


of LRs, 1.6 to 2.1). Complaints of malodor (or odor) are so 
strongly associated with bacterial vaginosis that absence of 
malodor virtually ruled out the condition in 1 study (LR, 
0.07; 95% Cl, 0.01-0.51). 35 A fishy odor noticed by the 
patient is not helpful in diagnosing trichomoniasis. 

Self-Diagnosis 

Women who complain of having “another yeast infection” 
are more likely to have candidiasis (LR, 3.3; 95% Cl, 1.2-9.1). 

Urinary tract symptoms were not found to be associated 
with any of the 3 diagnoses in 1 study, 8 whereas Eckert et al 33 
found “external” dysuria associated with candidiasis. 

Bleeding 

In one study of 17 patients with trichomoniasis, no patient 
complained of postcoital bleeding. 38 Of 67 patients with bac¬ 
terial vaginosis in the study by Livengood et al, 37 only 4% 
complained of abnormal bleeding. 

Dyspareunia 

Only 1 of 17 patients with trichomoniasis complained of dys¬ 
pareunia, which is a nonsignificant association. 38 


Accuracy of Signs 

Tables 52-4 and 52-5 present the sensitivity, specificity, and 
LRs for all signs. We evaluated (1) characteristics of the dis¬ 
charge (amount, color, consistency), (2) inflammatory find¬ 
ings (edema, erythema, excoriations, tenderness, mucopus), 
and (3) odor. 

Discharge 

The finding of a discharge on examination does not dis¬ 
tinguish between the 3 conditions. More than 60% of 
patients with these diagnoses have a discharge. A thick, 
curdy, or flocculent white discharge is strongly predictive 
of candidiasis (range of LRs, 2.7 to 130). The absence of 
these characteristics makes candidiasis less likely (range of 
LRs, 0.28 to 0.86). Women whose discharge is judged nor¬ 
mal (LR, 0.11; 95% Cl, 0.01-0.86) to mild (LR, 0.53; 95% 
Cl, 0.37-0.75) are less likely to have bacterial vaginosis 
than women with moderate (LR, 2.5; 95% Cl, 1.7-3.8) to 
profuse (LR, 3.0; 95% Cl, 0.32-28) discharge. A white dis¬ 
charge makes bacterial vaginosis less likely (range of LRs, 
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Table 52-5 Accuracy of Signs for the Diagnosis of Bacterial Vaginosis or Vaginal Trichomoniasis 




Sign 

Diagnosis 

No. of Patients With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Type of discharge noted by clinician 

Any 

BV 

27“ 

100 (NS) 

b 



8 

Vaginal discharge on vulvae 

BV 

67 

64 




38 

Normal 

BV 

81 

1 

89 

0.11 (0.01-0.86) 

1.1 (1.0-1.2) 

9 

Mild 

BV 

81 

33 

37 

0.53 (0.37-0.75) 

1.8(1.3-2.5) 

9 

Moderate 

BV 

81 

62 

75 

2.5(1.7-3.8) 

0.51 (0.38-0.69) 

9 

Profuse 

BV 

81 

4 

99 

3.0 (0.32-28) 

0.98(0.93-1.0) 

9 

Color or appearance 

Bloodstained 

BV 

81 

1 

99 

1.0(0.06-16) 

1.0(0.97-1.0) 

9 

Clear 

BV 

81 

0 

85 

0.01 (0-0.16) 

2.9 (1.6-5.4) 

9 

Green 

BV 

81 

1 

99 

1.0(0.06-16) 

1.0(0.97-1.0) 

9 

Mucoid 

BV 

33 

3 

100 

1.6(0.10-24) 

0.99 (0.92-1.1) 

35 

Purulent, frothy 

BV 

33 

30 

51 

0.62(0.34-1.1) 

1.4(0.96-1.9) 

35 

Yellow 

BV 

81 

60 

85 

4.1 (2.4-7.1) 

0.46 (0.35-0.62) 

9 


BV 

27“ 

30 (NS) 




8 


VT 

8“ 

50 (NS) 




8 


VT 

9 

89 

93 

14(6.1-31) 

0.12(0.02-0.75) 

40 

White 

BV 

81 

37 

32 

0.55 (0.40-0.75) 

2.0 (1.4-2.8) 

9 


BV 

27“ 

41 (NS) 




8 


VT 

8“ 

13 (NS) 




8 

Curdy 

BV 

33 

3 

71 

0.10(0.01-0.74) 

1.4 (1.1-1.7) 

35 

Consistency 

Homogeneous 

VT 

10 

100 

60 

2.2(1.7-2.8) 

0.15(0.02-1.0) 

10 

Thick 

BV 

27“ 

12 (NS) 




8 


VT 

8“ 

0 (NS) 




8 

Thin 

BV 

27“ 

88 (NS) 




8 


VT 

8“ 

100 (NS) 




8 

Transparent 

BV 

33 

0 

96 

0.31 (0.02-6.3) 

1.0 (0.97-1.1) 

35 

Inflammation 

Erythema or edema 

VT 

17 

18 

97 

6.4 (1.6-26) 

0.85 (0.68-1.1) 

39 

Vulvar 

BV 

67 

1 




38 


BV 

67 

12 




38 

Cervical 

BV 

67 

10 




38 

Vaginal 

BV 

67 

15 




38 

Vaginal wall 

BV 

27“ 

33 (NS) 




8 


VT 

8“ 

63 (NS) 




8 

Uterine/ad/nexal tenderness 

BV 

67 

12 




38 

Odor noted by clinician 

Any 

BV 

27“ 

78 (NS) 




8 


VT 

8“ 

87 (NS) 




8 


VT 

8“ 

50 (NS) 




8 

High cheese 

BV 

81 

78 

75 

3.2(2.1-47) 

0.30(0.19-0.45) 

9 


Abbreviations: BV, bacterial vaginosis; Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio; NS, reported by author to be not significantly associated 
with diagnosis; VT, vaginal trichomoniasis. 

“Patient may have had more than 1 diagnosis. 

“Ellipses indicate data not reported. 
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Table 52-6 Accuracy of Office Laboratory Tests for the Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis 



Laboratory Test 

Diagnosis 

No. of Patients 
With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Microscopy 

Clue cells 

VC 

23“ 

17 

40 

0.29 (0.12-0.73) 

2.0 (1.4-3.0) 

12 


VC 

24 

17 

16 

0.20 (0.08-0.49) 

5.4 (3.0-9.5) 

32 


VC 

32 b 

19 

C 



8 

Curved rods 

BV 

34 

86 




36 

Mobiluncus-type rods 

BV 

67 

53 




38 

Bacilli with corkscrew motility 

BV 

34 

65 

100 

44 (6.2-310) 

0.36 (0.23-0.57) 

36 

Lactobacilli scant or absent 

BV 

91 

90 

68 

3.1 (2.4-3.9) 

0.02(0-0.11) 

10 

Yeast seen with potassium 

VC 

23“ 

61 

77 

2.7 (1.4-4.9) 

0.51 (0.30-0.86) 

12 

hydroxide 

VC 

186 

56 




34 


VC 

32 b 

63 




8 


VC 

23 

83 




40 


VC 

21 

38 

94 

6.5 (2.5-17) 

0.66 (0.47-0.92) 

6 


BV 

27 b 

19 (NS) 




8 

Yeast seen with saline 

VC 

23“ 

65 

75 

2.6 (1.5-4.6) 

0.46 (0.26-0.83) 

12 

Yeast seen with saline and 
methylene blue 

VC 

23“ 

64 

83 

3.7 (1.9-7.6) 

0.44 (0.25-0.77) 

12 

Yeast seen with Gram stain 

VC 

23“ 

65 

100 

31 (4.4-220) 

0.36 (0.20-0.62) 

12 

Trichomonads seen with saline 

VC 

32 b 

0 (NS) 




8 


BV 

27 b 

11 (NS) 




8 

Leukocytes more than 

VC 

23“ 

13 

75 

0.52 (0.16-1.7) 

1.2(0.92-1.5) 

12 

epithelial cells 

BV 

34 

36 




36 

Leukocytes on slide 

VC 

32 b 

25 (NS) 




8 


BV 

27 b 

15 (NS) 




8 

pH Level 

<4.5 

VC 

140 

59 

23 

0.77 (0.66-0.90) 

1.8(1.3-2.4) 

7 


VC 

32 b 

67 




8 


VC 

23 

96 




40 

<4.9 

VC 

24 

71 

90 

7.2(3.4-15) 

0.32(0.17-0.61) 

32 

>5.0 

VC 

23“ 

77 

35 



12 

Leukocyte count (cells/high-power field) 

<10 

BV 

92 

77 




31 

10-50 

BV 

92 

18 




31 

>50 

BV 

92 

4 




31 

Whiff test result positive 

VC 

23“ 

17 

45 

0.31 (0.12-0.79) 

1.9 (1.3-2.7) 

12 


VC 

32 b 

13 (NS) 




8 


Abbreviations: BV, bacterial vaginosis; Cl, confidence interval; LR+, positive likelihood ratio; LR—, negative likelihood ratio; NS, reported by author to be not significantly associ¬ 
ated with diagnosis; VC, vaginal candidiasis. 

“For most tests, 1 to 2 patients had missing data for methylene blue, Gram stains, and whiff tests. For immunofluorescence tests, 16 patients had vaginal candidiasis. 
b A patient may have had more than 1 diagnosis. 

“Ellipses indicate data not reported. 


0.10 to 0.55). One study reports that bloodstained, green, 
clear, and purulent and frothy discharges are uncommon 
with bacterial vaginosis. 34 A yellow discharge increases the 
likelihood of both bacterial vaginosis (LR, 4.1; 95% Cl, 
2.4-7.1) and trichomoniasis (LR, 14; 95% Cl, 6.1-31). All 
patients in one study with trichomoniasis had a homoge¬ 
neous discharge. 10 


Inflammation 

Signs included a general impression of vulvar inflamma¬ 
tion by the clinician and specific signs such as vulvar or 
vaginal edema, erythema, fissures, or excoriations. The 
presence of these signs is associated with candidiasis 
(range of LRs, 2.1 to 8.4), although they can also occur in 
trichomoniasis (LR, 6.4; 95% Cl, 1.6-26). The absence of 
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these signs does not exclude the diagnosis of either candi¬ 
diasis or trichomoniasis. No studies allow calculation of 
the LR of inflammation for bacterial vaginosis, but the 
prevalence of a variety of inflammatory signs was low. 

Odor 

The presence of a “fishy” odor perceived by the clinician 
makes candidiasis unlikely (LR, 0.03; 95% Cl, 0-0.47), 
whereas the absence of an odor increases the likelihood 
(LR, 2.9; 95% Cl, 2.4-5.0). In contrast, the presence of a 
“high cheese” odor makes bacterial vaginosis more likely 
(LR, 3.2; 95% Cl, 2.1-4.7). Data on clinically perceived 
odors in trichomoniasis are limited. 

Accuracy of Office Laboratory Tests 

Tables 52-6 and 52-7 present the sensitivity, specificity, and 
LRs for all office laboratory tests. We evaluated (1) micros¬ 
copy for clue cells and other findings associated with bacte¬ 
rial vaginosis, (2) microscopy for yeast (using saline or 
potassium hydroxide), (3) microscopy for trichomonads, 
(4) microscopic evidence of inflammation, (5) measure¬ 
ment of vaginal pH, and (6) the whiff test. 

Microscopy 

The sensitivity of microscopy for yeast varies from 38% to 
83%. Consequently, the absence of yeast rules against candi¬ 
diasis but cannot exclude it (range of LRs, 0.46 to 0.66). 

Because clue cells are part of the diagnostic criteria for bac¬ 
terial vaginosis, 40 it is not possible to calculate LRs in this 


condition. Bacilli with corkscrew motility are highly associ¬ 
ated with bacterial vaginosis (LR, 44; 95% Cl, 6.2-310). The 
finding of scant or no lactobacilli is common in bacterial 
vaginosis (LR, 3.1; 95% Cl, 2.4-3.9), whereas finding normal 
levels of lactobacilli makes bacterial vaginosis unlikely (LR, 
0.02; 95% Cl, 0-0.11). The presence of clue cells makes can¬ 
didiasis unlikely (range of LRs, 0.20 to 0.29) but has no effect 
on the diagnosis of trichomoniasis. 

The identification of trichomonads in the wet mount diag¬ 
noses trichomoniasis, but their absence does not eliminate 
the diagnosis (range of LRs, 0.34 to 0.96). 

Microscopic Evidence of Inflammation 

The presence of many leukocytes seems relatively uncom¬ 
mon in candidiasis and bacterial vaginosis. One study, how¬ 
ever, found all 9 patients with trichomoniasis had more 
leukocytes than epithelial cells. 39 

pH Level 

Four of 5 studies on pH in vaginal candidiasis reported 
that a majority of patients (59%-96%) had a normal pH 
level (variably defined as <4.5 or <4.9). A fifth study found 
77% of candidiasis patients had a pH of greater than 5.0. 12 
Thus, a majority, but not all, of the studies report that 
candidiasis is associated with a normal pH level. The pH 
in bacterial vaginosis should be high (pH > 4.5) and is 
incorporated into the case definition. A majority of patients 
(>90%) with trichomoniasis will have an increased pH 
level, but the specificity (51%) has been evaluated in only 


Table 52-7 Accuracy of Office Laboratory Tests for the Diagnosis of Vaginal Trichomoniasis 

No. of Patients 


Laboratory Test 

With Diagnosis 

Sensitivity, % 

Specificity, % 

LR+ (95% Cl) 

LR- (95% Cl) 

Reference 

Microscopy 

Clue cells 

13 

69 

33 

1.0 (0.70-1.5) 

0.93 (0.39-2.2) 

32 

8 a 

75 (NS) 

b 



8 

Yeast seen with potassium hydroxide 

8 a 

13 (NS) 




8 

Trichomonads seen with saline 

8 a 

75 (NS) 




8 


9 

78 




40 


18 

67 

100 

100 (14-740) 

0.34(0.17-0.64) 

23 


10 

0 

100 

4.5(0.1-217) 

0.96(0.84-1.1) 

33 


88 

60 

100 

310(43-2200) 

0.40(0.31-0.52) 

37 


55 

49 

100 

51 (7.1-360) 

0.51 (0.40-0.67) 

11 

Leukocytes more numerous than epithelial cells 

9 

100 

74 

3.5 (2.3-5.2) 

0.14(0.02-0.87) 

40 

Leukocytes on slide 

8 a 

25 




8 

pH Level 

<4.5 

8 a 

17 




8 

>4.9 

9 

100 




40 

>5.4 

13 

92 

51 

1.9(1.4-2.5) 

0.15(0.02-1.0) 

32 

Whiff test result positive 

8 a 

25 (NS) 




8 


9 

67 

65 

1.9 (1.1-3.3) 

0.51 (0.20-1.3) 

40 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. 
a A patient may have had more than 1 diagnosis. 

Ellipses indicate data not reported. 
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1 study. Unfortunately, given the overlap between the pH 
levels in various conditions, it is hard to draw firm conclu¬ 
sions from the existing literature. 

Whiff Test 

A positive whiff test result makes candidiasis less likely (LR, 
0.31; 95% Cl, 0.12-0.79) but is positively associated with 
trichomoniasis (LR, 1.9; 95% Cl, 1.3-2.7). A positive whiff test 
result is one of the diagnostic criteria for bacterial vaginosis. 

Are These Symptoms and Signs Ever Normal? 

The distinction between normal and abnormal in terms of 
vaginal symptoms is problematic. The primary literature on 
normal vaginal discharge is scant. 41 It appears that a normal 
vaginal discharge increases at midcycle (because of an 
increase in cervical mucus), 42,43 can be malodorous, 44 and 
may be accompanied by irritative symptoms (such as itch). 45 
This problem is compounded by the fact that the vaginal 
pathogens identified by the current diagnostic approach can 
be found in asymptomatic women. 46,47 Gardnerella is part of 
the normal vaginal flora. 48 Thus, the identification of 
microbes in a vaginal discharge does not prove that they cre¬ 
ate symptoms. 


CLINICAL SCENARIOS—RESOLUTIONS 


CASE 1 What is the appropriate diagnostic evaluation? 
No symptom has enough predictive power to allow the 
confident diagnosis of any of the 3 main causes of vagini¬ 
tis. The wet mount examination remains the best way to 
make a diagnosis. 

Symptoms and signs can suggest a particular diagnosis. 
Candidiasis is associated with itching, a cheesy discharge, 
redness, and self-diagnosis, whereas bacterial vaginosis is 
associated with increased discharge and a complaint of 
odor. A watery discharge makes candidiasis unlikely. 

Inflammatory signs are relatively specific for vaginal 
candidiasis but are not always present and do occur in 
trichomoniasis. An absent or mild discharge makes bacte¬ 
rial vaginosis unlikely. Odor observed on examination 
occurs in bacterial vaginosis but not in candidiasis. 

Most diagnoses are made by microscopy and the whiff 
test. Most studies (but not all) would support that candi¬ 
diasis is associated with a normal pH level. Although the 
microscopic identification of yeast or trichomonads is 
diagnostic, these causes cannot be ruled out by negative 
findings on microscopy. The presence of clue cells makes 
candidiasis less likely. A lack of lactobacilli and the pres¬ 
ence of bacilli with corkscrew motility are 2 findings 
highly associated with bacterial vaginosis. 

CASE 2 What do you do when the diagnostic evaluation 
fails? Despite a full medical history, physical examination, 
and microscopy, the evaluation in this case does not pinpoint 
a cause of the patient’s symptoms. There are several possibil¬ 
ities to consider in patients for whom the diagnostic evalua¬ 
tion is inconclusive. It is possible that the algorithm has failed 
to diagnose vaginal candidiasis or trichomoniasis; clinicians 


should consider empirical therapy or further testing for 
trichomonads or Candida. Clinicians may want to con¬ 
sider less common causes of vaginal symptoms, including 
gonorrhea, chlamydia, herpes, or genital warts. Finally, 
there may be no pathologic condition causing the dis¬ 
charge, and the clinician may elect, after discussion with 
the patient, an approach of watchful waiting. 


THE BOTTOM LINE 

Our conclusions are subject to 2 important limitations. First, 
the LRs in these studies are not particularly robust. Second, 
despite dozens of articles devoted to the diagnosis of vaginal 
symptoms, we could locate only 18 that were useful in this 
review and none was of the highest methodologic quality. 

Current research on vaginitis has a number of weaknesses. 
Studies on vaginitis often mix together women with symptoms 
and those presenting for follow-up examinations or routine 
care. By analyzing data from these distinct patient groups as if 
they were one, the research fails to address either the question 
of how to diagnose patients with symptoms or how to screen 
for asymptomatic disease. The vocabulary of physical findings 
is not standardized (ie, what is a cheesy discharge?), case defi¬ 
nitions for candidiasis and trichomoniasis are not clear, and 
multiple criterion standards are used. Scant attention has been 
paid to interobserver variability, which is a key issue in the 
clinical examination. Furthermore, most studies concentrate 
on diagnosing one particular etiology. However, the task fac¬ 
ing the clinician is to choose among different etiologies. When 
2 pathogens are identified in a study (mixed infections), it is 
conceptually difficult to clarify whether one, both, or neither is 
responsible for the symptoms. Finally, the studies on tricho¬ 
monas, with only one exception, had fewer than 20 patients; 
this is not a good base on which to draw solid conclusions (a 
fact emphasized by the large 95% CIs of the LRs). 

In addition to these limitations, the existing diagnostic 
approach fails to diagnose approximately 30% of women 
with vaginal symptoms. The time is ripe for new approaches 
to these complaints. 

Despite these limitations, primary care clinicians need to 
be skilled in the diagnosis of vaginal candidiasis, bacterial 
vaginosis, and trichomoniasis. Patients may also have con¬ 
cerns regarding the meaning of these symptoms for their 
health and personal relationships 49 and these concerns need 
to be addressed sensitively. Recognizing that the clinical 
examination is a limited tool in this setting presents the 
problem of finding ways to better diagnose and treat patients 
with vaginal symptoms. Vaginal symptoms may be the most 
common gynecologic complaint in primary care, but much 
remains to be learned about their clinical diagnosis. 
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CLINICAL SCENARIO 


A 25-year-old woman who recently became sexually active 
presents with concerns about her new vaginal discharge 
and vaginal itching. She has not noticed an odor. When 
you do a speculum vaginal examination, should the dis¬ 
charge be examined microscopically for bacterial vagino¬ 
sis, yeast, and trichomonas or will the appearance of the 
discharge be sufficient for diagnosis? 

UPDATED SUMMARY ON VAGINITIS 

Original Review 

Anderson MR, Klink K, Cohrssen A. Evaluation of vaginal 
complaints. JAMA. 2004;291 (11): 1368-1379. 

UPDATED LITERATURE SEARCH 

Our literature search replicated that of the original article, 
confined to 2003 to April 2006. We identified 92 potential 
articles and reviewed the abstracts to find articles that 
included consecutive, prospectively identified patients with 
vaginal complaints in a primary care setting (primary care, 
general gynecology, or sexually transmitted disease clinics). 
Our focus was on identifying clinical studies that evaluated 
symptomatic women. We found 1 new article that met these 
standards. The literature search also uncovered 2 recent arti¬ 
cles that assessed new bedside tests for bacterial vaginosis and 
trichomoniasis and that had data suitable for summarizing in 
likelihood ratios (LRs). 

NEW FINDINGS 

• The patient’s symptom of an abnormal vaginal odor is a 
useful finding, but distinguishing bacterial vaginosis from 
vaginal candidiasis is not as efficient as proposed in the 
original report. Fortuitously, the LRs for bacterial vaginosis 
when the woman perceives an odor and for candidiasis 
when an odor is absent make perceived odor a useful 
symptom for clinical diagnosis. The patient’s perception of 
an odor increases her likelihood of bacterial vaginosis 


(summary LR, 2.2; 95% confidence interval [Cl], 1.4-3.6), 
whereas the absence of an odor has the same effect in 
increasing the likelihood of vaginal candidiasis (summary 
LR, 2.2; 95% Cl, 1.9-2.5). 

• When clinicians do not have microscopes, point-of-care 
testing may prove useful for bacterial vaginosis and vaginal 
trichomoniasis. 

Details of the Update 

A recent study 1 includes the largest patient sample in which 
all 3 diagnoses were systematically evaluated. For each of 
the target conditions, the investigators reported data that 
allow calculation of the LRs for abnormal discharge, change 
in discharge, odor, vaginal pruritus, vaginal burning, and 
dysuria. A vaginal odor is the most useful symptom for dis¬ 
tinguishing patients with bacterial vaginosis (odor symp¬ 
toms present) from those with vaginal candidiasis (no 
perceived odor). No symptom worked for identifying 
women with vaginal trichomoniasis, because the LR Cl for 
every symptom (both positive and negative LRs) includes 1. 
For both candidiasis and trichomoniasis, microscopic tests 
by the clinician are much more useful than the symptoms. 
The presence of yeast on a potassium hydroxide (KOH) 
preparation had an LR of 7.4 (95% Cl, 3.8-15) vs culture, 
whereas the absence of yeast forms is less useful in identify¬ 
ing women who will have positive yeast culture results (LR, 
0.80; 95% Cl 0.74-0.87). The presence of trichomonads on 
a wet preparation slide was virtually diagnostic (LR, 22; 
95% Cl, 13-37). The absence of trichomonads does not rule 
out vaginal trichomoniasis because a culture result can still 
be positive (LR, 0.39; 95% Cl, 0.29-0.53). 

Although not reviewed in the original Rational Clinical 
Examination article on vaginitis, point-of-care testing for 
both bacterial vaginosis and vaginal trichomoniasis is gath¬ 
ering increased attention. Approved products are now avail¬ 
able and marketed toward clinics that do not have access to 
microscopes or trained personnel for assessing the presence 
of clue cells (bacterial vaginosis) or trichomonads. Com¬ 
pared with the Amsel criteria, 2 the BVBlue Test (Gryphus 
Diagnostics, LLC, Birmingham, Alabama) has a positive LR 
of 9.8 (95% Cl, 6.0-16) and a negative LR of 0.13 (95% Cl, 
0.08-0.21). 3 The test uses a chromogenic assay for vaginal 
fluid sialidase produced by bacteria. Although the test takes 
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fewer than 10 minutes to perform, in this study the test kits 
were taken to a laboratory for processing. The findings 
require further study in a setting in which the clinic person¬ 
nel interpret the results as a true “bedside” test, rather than 
sending the sample to a trained laboratory technician. A 
second type of point-of-care test for bacterial vaginosis 
incorporates a pH test and a test for amines (both of these 
are part of the Amsel criteria 2 ). In a resource-poor environ¬ 
ment, Azerbaijani women at a health fair were screened 
with the FemExam (Litmus Concepts, Inc, Santa Clara, Cal¬ 
ifornia). 4 Compared with the Amsel criteria, 2 a FemExam 
result positive for both pH and amines has a sensitivity of 
92% for bacterial vaginosis, suggesting that it may be a rea¬ 
sonable substitute for the complete Amsel criteria 2 (positive 
LR, 7.5; 95% Cl, 4.0-14). However, finding that both the 
pH and amine results are negative has an LR that is 0.45 
(95% Cl, 0.34-0.57), which is not low enough to rule out 
bacterial vaginosis, given its high pretest probability. 
Although most of the women in the study did have an 
abnormal vaginal discharge, not all were specifically seeking 
care for vaginitis. A point-of-care test for trichomoniasis 
(Xenostrip-Tv; Xenotope Diagnostics, San Antonio, Texas) 
identifies antigen to the protozoan. The test is highly effi¬ 
cient at confirming infection, with a positive LR of 361 
(95% Cl, 22-5845), but a normal result does not rule out 
vaginal trichomoniasis, with a negative LR of 0.52 (95% Cl, 
0.40-0.67). 5 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

With data from the original ie 52-8 and from articles 
identified in the update, 1 - 6 the prevalence of vaginal candi¬ 
diasis, bacterial vaginosis, and vaginal trichomoniasis 
among women with vaginal complaints and presenting for 
care can be summarized. The summary estimates provide 
a reasonable anchor for making clinical decisions, though 
the data suggest geographic variability, which means that 
each provider needs a sense of prevalence in his or her 
own practice setting. The summary prevalences are as fol¬ 
lows: bacterial vaginosis, 34% (95% Cl, 28%-41%); vagi¬ 
nal candidiasis, 26% (95% Cl, 22%-30%); and vaginal 
trichomoniasis, 10% (95% Cl, 7%-15%). These preva¬ 
lences support the notion that approximately 30% of 
women will have less common infections or remain undi¬ 
agnosed after their evaluation. 

We calculated summary LR for several of the symptoms in 
which the results were clinically consistent across studies. 
When considering the Cl associated with these summary 
LRs, the clinician should have a better sense for the utility of 
the findings. 


CHANGES IN THE REFERENCE STANDARD 


RESULTS OF LITERATURE REVIEW 


Table 52-8 Univariate Findings for Vaginitis 


Finding 

Condition (No. of 
Studies) 3 

Summary LR+ 
(95% Cl) 

Summary LR- 
(95% Cl) 

Patient Symptoms 

Vaginal odor 

Bacterial vaginosis 
(2) 

2.2 (1.4-3.6) 

0.30 (0.24-0.38) 


Candidiasis (3) 

0.29 (0.20-0.43) 

2.2 (1.9-2.5) 

Vaginal itching 

Candidiasis (5) 

1.5 (1.3-1.8) 

0.53 (0.33-0.86) 

Microscopic Tests 

Yeast forms on a 
KOH preparation 

Candidiasis (3) 

4.8 (2.7-8.4) 

0.78(0.71-0.85) 

Trichomonads 
seen with a saline 
preparation 

Trichomoniasis 

(5) 

46 (17-121) 

0.50(0.36-0.71) 


Abbreviations: Cl, confidence interval; K0H, potassium hydroxide; LR+, positive likeli¬ 
hood ratio; LR—, negative likelihood ratio. 

“Data are combined from that in Table 2 of the original Rational Clinical Examination 
article article by Anderson et al 7 and Table 6 in the article by Landers et al. 1 

EVIDENCE FROM GUIDELINES 

The Centers for Disease Control and Prevention funds an online 
training program developed by the Seattle STD/HIV Prevention 
Training Center that can be reviewed by clinicians who do office 
microscopy to diagnose vaginitis (http://depts.washington.edu/ 
nnptc/online_training/wet_preps_video.html; accessed June 
15,2008). 

Although bacterial vaginosis in pregnancy was not a focus 
of the review, the US Preventive Health Services Task Force 8 
evaluated the condition and found the evidence lacking to 
recommend for or against screening high-risk pregnant 
women for bacterial vaginosis. For clinicians who choose to 
screen, the task force observed that the Amsel criteria 2 are the 
accepted clinical criteria even though the “optimal” test has 
not been determined. 


CLINICAL SCENARIO—RESOLUTION 


The diagnosis of vaginitis requires microscopic examination 
of the vaginal discharge. Although you may not be able to 
determine a diagnosis in about 30% of patients, approxi¬ 
mately 33% will have bacterial vaginosis, 25% will have can¬ 
didiasis, and 10% will have trichomonas. The lack of a 
perceived odor makes candidiasis more likely (LR, 2.2), but 
the absence of the symptom is not conclusive. A thick or 
“curdy” discharge would be compatible with yeast, but 
women may have multiple infections. Thus, a diagnosis is 
best established by obtaining a specimen for: (1) measuring 
the pH; (2) preparing a slide for KOH assessment (evaluate 
the odor after application of KOH for the whiff test [bacterial 
vaginosis] and use the microscope to identify yeast forms); 
and (3) preparing a separate wet saline microscopic slide (for 
clue cells and trichomoniasis). 


None. 
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VAGINITIS— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Among women with vaginal symptoms, the most common 
diagnoses are bacterial vaginosis (34%), vaginal candidiasis 
(26%), and vaginal trichomoniasis (10%). The prevalence 
changes across regions, so clinicians should be familiar with 
the findings in their own clinics. 

POPULATION FOR WHOM VAGINITIS 
SHOULD BE CONSIDERED 

Vaginitis should be considered in any woman with concerns 
about a vaginal symptom that typically includes a combina¬ 
tion of vaginal discharge, odor, irritation, or pruritus. 

DETECTING THE LIKELIHOOD OF 
CAUSES OF VAGINITIS 

Although the presence of odor helps identify women more 
likely to have bacterial vaginosis versus candidiasis, no 
symptoms reliably identify those with trichomoniasis (see 
Table 52-9). Thus, unless point-of-care tests become validated, 
a microscopic evaluation is required for identifying clue cells 
(bacterial vaginosis), yeast forms (vaginal candidiasis), or 
trichomonads (vaginal trichomoniasis). Clinicians who do 
office microscopy need appropriate training to recognize 
the findings (http://depts.washington.edu/nnptc/online_ 
training/wet_preps_ video.html; accessed June 15,2008). 

REFERENCE STANDARD TESTS 

Bacterial Vaginosis 

The pragmatic reference standard consists of the Amsel cri¬ 
teria. 2 These require 4 different tests, of which at least 3 must 
have positive results: (1) a thin, homogenous vaginal dis¬ 
charge; (2) clue cells on microscopic examination; (3) posi¬ 
tive whiff test; and (4) vaginal pH higher than 4.5. 


Table 52-9 Likelihood Ratios of Symptoms and 
Microscopy for Vaginitis 


Finding 

Condition 

LR+ (95% Cl) 

LR- (95% Cl) 

Patient Symptoms 

Vaginal odor 
(symptoms) 

Bacterial 

vaginosis 

2.2 (1.4-3.6) 

0.30 (0.24-0.38) 


Candidiasis 

0.29 

(0.20-0.43) 

2.2(1.9-2.5) 

Vaginal itching 

Candidiasis 

1.5 (1.3-1.8) 

0.53 (0.33-0.86) 

Odor, itching, 
vaginal burning, 
dysuria 

Trichomoniasis 

The LR+ and LR- have narrow CIs 
that include 1, suggesting they are 
of no value 

Microscopic Tests 

Yeast forms on a 
KOH preparation 

Candidiasis 
(n = 3) 

4.8 (2.7-8.4) 

0.78(0.71-0.85) 

Trichomonads 
seen with a saline 
preparation 

Trichomoniasis 
(n = 5) 

46 (17-121) 

0.50 (0.36-0.71) 


Abbreviations: Cl, confidence interval; KOH, potassium hydroxide; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 


The reference standard test requires culture, though culture 
cannot distinguish between infections and colonization. 

Trichomoniasis 

The reference standard test in clinical research studies typi¬ 
cally requires culture. However, in clinical practice the pres¬ 
ence of trichomonads on a saline microscopic preparation is 
considered diagnostic, though the absence of trichomonads 
does not definitively rule out the condition. 
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EVIDENCE TO 


SUPPORT THE UPDATE: 


Vaginitis 



TITLE Predictive Value of the Clinical Diagnosis of 
Lower Genital Tract Infection in Women. 

AUTHORS Landers DV, Wiesenfeld HC, Heine P, 
Krohn MA, Hillier SL. 

CITATION Am J Obstet Gynecol. 2004;190(4):1004-1010. 

QUESTION Can experienced midlevel practitioners 
correctly diagnosis vaginitis among women with vaginal 
complaints? 

DESIGN Prospective, independent. 

SETTING Three sites in Pittsburgh, Pennsylvania: a stu¬ 
dent health center, a public sexually transmitted disease 
clinic, and a suburban public health clinic. Two of the clini¬ 
cians were physician assistants and 1 was a nurse practitio¬ 
ner. Each clinician underwent specific instruction for the 
study and had competency testing in the bedside tests and 
microscopic studies. 

PATIENTS Women aged 18 to 45 years and with 
untreated genital complaints consisting of vaginal dis¬ 
charge, odor, itching, or lower genital tract burning. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

Each patient filled out a questionnaire and then received a 
speculum examination. The clinician recorded evidence of 
mucopurulent cervicitis and evaluated vaginal secretions for 
color, viscosity, homogeneity, and odor after the addition of 
potassium hydroxide (KOH) to a sample of the vaginal secre¬ 
tions. The secretions were used to perform a KOH microscopic 
evaluation, pH testing, Gram stain, trichomonas, and yeast cul¬ 
ture, along with endocervical cultures for sexually transmitted 
diseases and a Papanicolaou test. A clinical diagnosis for yeast 
was established from the microscopic KOH slide preparation 
that showed yeast. Trichomoniasis was diagnosed by observation 
of motile bacteria on the microscopic slide. Bacterial vaginosis 
was established by applying Amsel criteria. 1 

The laboratory reference standard diagnosis for trichomonas 
and yeast was established by culture, and bacterial vaginosis was 
established by Gram stain examined for Nugent criteria. 2 


MAIN OUTCOME MEASURES 

Sensitivity and specificity of the clinical diagnosis compared 
with the laboratory diagnosis. The sensitivity and specificity 
of the various vaginal complaints for bacterial vaginosis 
could be calculated from data in the article. 


MAIN RESULTS 

Among these 598 women with vaginal complaints, at least 1 
microbiologic diagnosis was established in 79%. The distri¬ 
bution was bacterial vaginosis, 49%; vaginal yeast, 29%; tri¬ 
chomoniasis, 12%; and chlamydia or gonorrhea, 11%. 
Women could be coinfected by multiple organisms. 

, , and 52-12 show the value of symptoms for 

bacterial vaginosis, candidiasis, and trichomoniasis. 

52-13 displays the likelihood ratio (LR) of the clinical diag¬ 
nosis for each infection compared to a laboratory criterion 
standard. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 2. 

STRENGTHS The criteria for the clinicians’ diagnoses are 
well outlined. Not only can the likelihood ratios (LRs) for the 
individual symptoms be reported but also the LRs for the 
bedside tests. 


Table 52-10 Likelihood Ratios for Symptoms of Bacterial Vaginosis 
Compared With Amsel Criteria 1 

Symptoms 

LR+ (95% Cl) 

LR- (95% Cl) 

Vaginal odor 

3.2 (2.6-3.9) 

0.31 (0.25-0.39) 

Change in discharge 

2.2(1.8-2.6) 

0.38(0.31-0.47) 

Abnormal discharge 

1.9(17-2.2) 

0.26(0.19-0.35) 

Dysuria 

1.5(0.97-2.3) 

0.95(0.89-1.1) 

Vaginal burning 

1.3 (0.96-1.9) 

0.93 (0.86-1.0) 

Vaginal pruritus 

1.2(0.97-1.5) 

0.91 (0.81-1.0) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 
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Table 52-11 Likelihood Ratios for Symptoms of Vaginal Candidiasis 
Compared With Culture 

Symptoms 

LR+ (95% Cl) 

LR- (95% Cl) 

Vaginal pruritus 

1.1 (0.87-1.4) 

0.95 (0.84-1.1) 

Vaginal burning 

0.58 (0.37-0.90) 

1.1 (1.0-1.2) 

Change in discharge 

0.47 (0.37-0.60) 

1.9 (1.6-2.2) 

Abnormal discharge 

0.40 (0.32-0.50) 

3.0 (2.4-3.7) 

Dysuria 

0.37 (0.19-0.71) 

1.1 (1.0-1.2) 

Vaginal odor 

0.22 (0.15-0.32) 

2.3 (2.0-2.7) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 52-12 Likelihood Ratios for Symptoms of Vaginal 

Trichomoniasis Compared With Culture 

Symptom 

LR+ (95% Cl) 

LR- (95% Cl) 

Vaginal burning 

1.1 (0.86-1.8) 

0.98(0.87-1.1) 

Dysuria 

1.1 (0.58-2.1) 

0.99(0.90-1.1) 

Vaginal odor 

1.0 (0.78-1.3) 

1.0(0.79-1.3) 

Vaginal pruritus 

1.0 (0.70-1.4) 

1.0 (0.84-1.2) 

Change in discharge 

0.9 (0.7-1.2) 

1.1 (0.87-1.4) 

Abnormal discharge 

0.81 (0.65-1.0) 

1.3 (1.0-1.8) 


Abbreviations: Cl, confidence interval; LR+, positive likelihood ratio; LR-, negative 
likelihood ratio. 


Table 52-13 Likelihood Ratios for the Clinician’s Diagnosis Compared 
With Laboratory Diagnosis 

Clinical Diagnosis 

LR+ (95% Cl) 

LR- (95% Cl) 

Trichomonas (wet preparation) 

21 (13-37) 

0.39 (0.29-0.53) 

Candidiasis (KOH preparation) 

7.4(3.8-15) 

0.80 (0.74-0.87) 

Bacterial vaginosis (Amsel criteria) 

4.0 (3.2-4.9) 

0.11 (0.07-0.17) 


Abbreviations: Cl, confidence interval; KOH, potassium hydroxide; LR+, positive like¬ 
lihood ratio; LR-, negative likelihood ratio. 


LIMITATIONS These results were those of experienced 
midlevel practitioners who were specifically trained to do the 
clinical and microscopic examination. Not only were they 
trained but also they demonstrated competency in the per¬ 
formance of the bedside tests. Generalist physicians would 
have to ensure their competency in microscopic examina¬ 
tions of vaginal secretions to replicate the results. However, 
the authors provide accuracy data for these 2 microscopic 
studies compared with cultures. 

A patient’s symptom of an abnormal vaginal odor makes bac¬ 
terial vaginosis more likely, with an LR of 3.2 (95% confidence 
interval [Cl], 2.6-3.9), whereas the absence of the odor makes 
vaginal candidiasis more likely, with an LR of 2.3 (95% Cl, 
2.0-2.7). A patient’s symptoms of a “change” in her vaginal dis¬ 
charge worked similarly (though not as well) to the presence of 
an odor: a change in the vaginal discharge made bacterial vagi¬ 
nosis more likely (LR, 2.2; 95% Cl, 1.8-2.6), whereas no change 
in the discharge despite vaginal complaints increased the likeli¬ 
hood of candidiasis (LR, 1.9; 95% Cl, 1.6-2.2). 

Vaginal pruritus was an inefficient finding for candidiasis. 

The symptoms have almost no value for diagnosing tri¬ 
chomoniasis. Although trichomoniasis is the least common 
of the 3 diagnoses, examination of a microscopic preparation 
for the organism is necessary. The presence of trichomonads 
on a microscopic specimen makes the diagnosis of trichomo¬ 
niasis almost certain. The Amsel criteria 1 for bacterial vagi¬ 
nosis and the presence of yeast on a KOH preparation are 
also much more useful than the individual clinical findings. 

Reviewed by David L. Simel, MD, MHS 
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CLINICAL SCENARIOS 


CHAPTER 


Does This Dizzy Patient 
Have a Serious Form of 

Vertigo? 

David A. Froehling, MD 
Marc D. Silverstein, MD 
David N. Mohr, MD 
Charles W. Beatty, MD 


Common Causes of Vertigo 

CASE 1 A 52-year-old woman was admitted to the hos¬ 
pital because of nausea, a constant spinning sensation, 
and vomiting of 24 hours’ duration. Any movement of her 
head made these symptoms worse. On examination, she 
had bilateral horizontal spontaneous nystagmus. Two 
days later, after symptomatic improvement, she was dis¬ 
charged. At follow-up 2 weeks later, her symptoms and 
nystagmus had completely resolved. 

CASE 2 A 70-year-old woman had a 4-month history of 
an intermittent whirling sensation when turning her head 
and especially when rolling over in bed. On examination, a 
left-side-down head-hanging maneuver elicited rotatory 
nystagmus, with the fast component to the left ear (Figure 
53-1). There was a latency of about 3 seconds before the 
onset of nystagmus, which lasted approximately 10 seconds. 


WHY EVALUATE VERTIGO? 


Vertigo is defined in Merriam-Webster’s dictionary 1 as a distur¬ 
bance “in which the external world seems to revolve around the 
individual or in which the individual seems to revolve in space.” 
Vertigo is an illusion of motion 2 and is one of several forms of 
dizziness. The word dizziness is derived from the old English 
word dysig, meaning foolish or stupid. The modern usage of the 
word includes “a whirling sensation in the head with a tendency 
to fall,” “mentally confused or dazed,” and “giddiness.” 1 

In one study 3 from a general internal medicine outpatient 
clinic, dizziness was the third most frequent complaint of 
patients. In a national survey reported in 1989, 4 it was the 
13th most frequent reason for visits to internists in the 
United States. Dizziness is often a diagnostic problem in the 
emergency department. 5 Among patients treated in an emer¬ 
gency department, 5 in an outpatient clinic, 6 and in 2 subspe¬ 
cialty dizziness clinics, 7,8 vertigo was the most frequent 
category of dizziness. 

Most patients with dizziness can be classified as having one 
of the following syndromes: 

1. impaired perfusion of the central nervous system or near 
syncope (eg, orthostatic hypotension, cardiac presyncope) 

2. dysequilibrium, a sensation of imbalance when standing 
or walking 6 (eg, multiple sensory deficits) 

3. psychogenic dizziness (eg, major depression, anxiety dis¬ 
order, and somatization disorder) 

4. vertigo (eg, Meniere disease and vestibular neuronitis) 7 

Usually dizziness can be classified according to informa¬ 
tion obtained from the medical history and physical 
examination. In this article, we concentrate on the evalua- 


Copyright © 2009 by the American Medical Association. Click here for terms of use. 








CHAPTER 53 The Rational Clinical Examination 



Examiner rotates patient’s head 
laterally and extends patient’s neck. 


Examiner returns patient to seated position and allows rest 
for 30 seconds. The maneuver is repeated with the head 
extended and rotated in the opposite direction. 


Examiner lays patient down with 
head hanging off of table. 


Examiner observes patient's eyes for appearance of nystagmus. 

L R 

Slow 


Positive indication: maneuver reproduces patient's 
vertiginous symptoms and creates nystagmus. 


/»iw 


Figure 53-1 How to Test for Positional Nystagmus 

The Dix-Hallpike maneuver for positional vertigo is performed by the examiner, who stands at the head of the bed. As the patient is supported and low¬ 
ered into a position whereby his or her rotated and extended head hangs off the end of the examining table, the examiner observes for nystagmus. In 
this view, the patient's head has been rotated to the left and expresses nystagmus with a slow response to the right and a rapid response the left. 
Repeating the maneuver with the head rotated in the opposite direction would reverse the direction of the nystagmus. A maneuver (with positive indica¬ 
tion) will reproduce the patient’s symptoms. 


tion of vertigo, the most common category of dizziness. 
Serious forms of vertigo are due to conditions associated 
with increased mortality or long-term disability. Vertigo 
severe enough to impair daily functioning and lasting for 
more than a month would be included as a serious form of 
vertigo. 

The importance of recognizing a patient’s complaint of diz¬ 
ziness as vertigo is that it narrows the list of possible causes. 
Customarily, the causes of vertigo are divided into central 


Table 53-1 Common Causes of Vertigo 
Peripheral 

Benign paroxysmal positional vertigo 

Vestibular neuronitis 

Recurrent vestibulopathy 

Classic Meniere disease 

Head trauma (labyrinthine concussion) 

Acoustic neuroma 

Otosclerosis 

Herpes zoster oticus 

Cholesteatoma 

Perilymph fistula 

Aminoglycoside ototoxicity 

Central 

Vertebrobasilar transient ischemic attacks 
Cerebellar or brainstem stroke 
Brain tumors 
Multiple sclerosis 
Vertebrobasilar migraine 


causes (lesions of the central nervous system) and peripheral 
causes (lesions of the vestibular labyrinth or nerve or both) 
(Table 53 ). Because of the importance of detecting lesions or 
diagnosing syndromes that can be treated and because of the 
need to determine prognosis, physicians should attempt to 
make a specific diagnosis for patients with vertigo. 

Most cases of vertigo are due to lesions of the vestibular 
nerve or labyrinth. 5 8 In 2 dizziness clinics, the most com¬ 
mon cause of vertigo was benign paroxysmal positional 
vertigo. 7,8 

PATHOPHYSIOLOGY OF VERTIGO AND NYSTAGMUS 

Origins of Vertigo 

The maintenance of the sense of balance and spatial orien¬ 
tation depends on input from the vestibular labyrinth, 
visual system, and proprioceptive nerves arising from ten¬ 
dons, muscles, and joints. 9 The vestibular nuclei, which 
are in the medulla and lower pons, receive input from the 
vestibular labyrinth via the vestibular branch of cranial 
nerve VIII and from the cerebellum. 10 The vestibular 
nuclei, in turn, send efferent fibers to the cerebellum, the 
medial longitudinal fasciculus, and the vestibulospinal 
tract. Visceral manifestations of vertigo (such as nausea 
and vomiting) are caused by altered input to the dorsal 
nucleus of the vagus nerve from the vestibular nuclei. Con¬ 
scious awareness of vertigo resides in the superior tempo¬ 
ral gyrus of the cerebral cortex 9 and involves a mismatch 
between input to the cerebral cortex from the visual, 
proprioceptive, and vestibular systems. 11 Lesions in vari¬ 
ous locations, including the inner ear, brain stem, and cer¬ 
ebellum, may all be manifested as vertigo. 
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Origins of Nystagmus 

Nystagmus is the objective accompaniment of vertigo and is 
defined best as a “rhythmical oscillation of the eyes, with a 
fast movement in one direction and a slow movement in the 
other.” 12 The fast component may be horizontal, vertical, 
rotatory, or any combination of these. 13 

There are 2 clinically relevant kinds of nystagmus in evalu¬ 
ating vertigo: Spontaneous nystagmus is elicited by having 
the patient look straight ahead, up, down, to the right, and to 
the left. This type of nystagmus is not influenced by head 
position. 14 It is normal to have a few beats of nystagmus with 
extreme lateral gaze. 13 Positional nystagmus is elicited by a 
head-hanging maneuver (Figure 53-1). 13 

Altered input passing from the vestibular nuclei to the 
nuclei of the extraocular muscles through the medial longi¬ 
tudinal fasciculus and related pathways in the reticular for¬ 
mation produces nystagmus. This input may be modified by 
information arising from the cerebral cortex and the cerebel¬ 
lum. 13 For example, the fast component of spontaneous nys¬ 
tagmus depends on interaction between the vestibular 
system and the cerebral cortex. 15 


HOW TO ELICIT THE SYMPTOMS AND 
SIGNS OF VERTIGO 

First, Distinguish Vertigo From Other 
Causes of Dizziness 

Patients often have difficulty describing symptoms of dizzi¬ 
ness, and even those who have disorders that produce vertigo 
may not clearly describe a hallucination of movement. As 
Olsson and Atkins 16 pointed out, “A person is so rarely con¬ 
scious of his own vestibular system, he has a great deal of 
trouble describing his symptoms to a doctor.” Thus, clues 
must be gathered from the medical history and physical 
examination to classify the dizziness properly. 

Dizziness when standing may be due to vertigo, decreased 
cerebral perfusion, 17 or dysequilibrium. 6 If the patient reports 
having symptoms of dizziness primarily while standing, the 
blood pressure should be checked with the patient in the 
supine position and also after standing for 5 minutes. If there 
is an orthostatic decrease in blood pressure, the symptom is 
likely due to impaired central nervous system perfusion. 

Unsteadiness while walking, especially in elderly patients, 
is often due to dysequilibrium (a feeling of imbalance). The 
cause is usually multifactorial. On examination, the findings 
of decreased visual acuity and signs of peripheral neuropathy 
or abnormal vestibular function support a diagnosis of dys¬ 
equilibrium. 6 ' 7 

Dizziness when turning, and especially when rolling over 
in bed, is usually due to vertigo. 

Psychogenic dizziness is a diagnosis of exclusion that 
should be considered especially in patients with psychiatric 
illnesses, such as major depression, anxiety disorder, and a 
somatization disorder. In this setting, the patient should be 
asked to hyperventilate for 2 minutes and then asked whether 
the feeling associated with hyperventilation is exactly the 


same as the dizzy symptom. The physician should initially 
hyperventilate along with the patient; this approach encour¬ 
ages the patient and demonstrates the desired rate and depth 
of breathing for the test. 18 If hyperventilation reproduces the 
symptom, the dizziness is often psychogenic. However, the 
usefulness of hyperventilation in diagnosing psychogenic 
dizziness is unclear. In a study by Kroenke et al 6 of 100 ambu¬ 
latory patients with a chief complaint of dizziness, symptoms 
of dizziness were reproduced by hyperventilation in 21; how¬ 
ever, only 1 of these patients had hyperventilation as the pri¬ 
mary cause of dizziness. Most of them had dizziness inducible 
by other maneuvers in addition to hyperventilation. Further 
studies of the hyperventilation maneuver in the evaluation of 
patients with suspected psychogenic dizziness are needed. In 
this study of 100 patients, only 16% had pure psychogenic 
dizziness, but 24% had other causes of dizziness exacerbated 
by psychiatric illness. 6 

Second, Take a Proper Medical History 
From Patients With Vertigo 

After it is clear that the patient is describing vertigo, further 
questions help elicit clues about its specific cause. 

Ask When the Dizziness Occurs 

It is probably more important to ask a patient about the cir¬ 
cumstances in which the dizziness occurs than to ask for a 
description of the dizziness. Dizziness related to early-morning 
activities is somewhat helpful in distinguishing between 
peripheral and central vertigo. Matutinal vertigo (vertigo on 
first arising in the morning) is usually due to a peripheral 
vestibular disorder. 19 

Ask About Other Otologic Symptoms 

Associated otologic symptoms can be helpful in identifying a 
peripheral cause of vertigo. Hearing loss and vertigo are 
common in patients with otosclerosis. 20 Episodes of hearing 
loss with vertigo, tinnitus, and a sensation of fullness in the 
ear occur in patients with Meniere disease. 21 Patients with 
acoustic neuromas usually present with hearing loss rather 
than vertigo. Most of these patients notice dizziness but com¬ 
plain of unsteadiness rather than vertigo. 22 

Ask About Other Neurologic Symptoms 

Symptoms of neurologic disease, such as weakness, difficulty 
with speech, or diplopia, in addition to vertigo suggest a cen¬ 
tral cause. 

Ask About Symptom Patterns 

Patients with vestibular neuronitis (also called labyrinthitis), 
benign paroxysmal positional vertigo, and recurrent vestibu¬ 
lopathy (also called benign recurrent vertigo and vestibular 
Meniere disease) have normal hearing. 23 ' 26 Patients with 
benign paroxysmal positional vertigo 23 (also called benign 
paroxysmal positional nystagmus 27 and cupulolithiasis 28 ) 
have intermittent episodes of vertigo with head turning. 23 - 29 
Vestibular neuronitis is characterized by a relatively sudden 
onset of severe, constant vertigo (made worse by head move¬ 
ment) that resolves after days or weeks. 23,30 Patients with 
recurrent vestibulopathy have intermittent episodes of con¬ 
stant vertigo lasting for minutes or hours. 24,25 Vertigo (with or 
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Table 53-2 Accuracy of Signs and Symptoms for Diagnosing Peripheral Vertigo in an Emergency Department 2 


No. of Patients With No. of Patients With Other 

Peripheral Vertigo (Not an Causes of Dizziness That Likelihood 

Emergency) Might Be an Emergency Total Predictive Value, % Ratio 


Positive cluster of signs and symptoms 8 

23 

4 

27 

Positive 85 (23/27) 

7.6 

Lack of one or more elements in cluster 

31 

67 

98 

Negative 68 (67/98) 

0.6 

Total 

54 

71 

125 

C 



“Data from Herretal . 5 

“Positive cluster includes positive results on head-hanging maneuver plus either vertigo or vomiting. 
“Ellipses indicate not applicable. 


without hearing loss) in a patient who has recently received 
aminoglycoside antibiotics may be due to the toxic effect 
these agents have on the vestibular labyrinth. 31 

How to Examine Patients With Vertigo 

Findings on physical examination can help physicians detect 
abnormalities that can be used to determine the cause of vertigo. 

Perform a Brief Neurologic Examination 

Look for cranial nerve palsies, weakness, reflex changes, 
ataxia, decreased sensation in the feet, and abnormalities of 
gait and station. Vertical nystagmus is associated with lesions 
of the vestibular nuclei or of the cerebellar vermis. 13 Neuro¬ 
logic findings other than pathologic nystagmus suggest that 
the lesion is central. 

Examine the Ears 

Hearing should be checked. 32 Cholesteatoma, a complication 
of chronic otitis media that can present with hearing loss, 
drainage from the ear, and vertigo, may be found 33 ; the usual 
treatment for this is surgery. Alternatively, vesicles associated 
with herpes zoster oticus (also called Ramsay Hunt syn¬ 
drome) may be present; patients with this condition often 
have facial palsy and deafness, together with vertigo. 34 

Check for Spontaneous Nystagmus 

Patients with vestibular neuronitis usually have spontaneous 
horizontal nystagmus or a mixture of spontaneous horizon¬ 
tal nystagmus and rotatory nystagmus. 30 Patients with disor¬ 
ders of the central nervous system may also have spontaneous 
nystagmus. 35 In most of the patients examined by Silvoniemi, 30 
Lachman and Stahle, 36 and Aantaa and Virolainen, 37 nystagmus 
was readily apparent, but in some, detection required Frenzel 
glasses or electronystagmographic monitoring with the 
patients’ eyes closed. Patients with vestibular neuronitis may 
also have positional nystagmus. 30 Patient 1 in the clinical sce¬ 
narios had vestibular neuronitis. 

Perform a Head-Hanging Maneuver 

Most physicians test for positional nystagmus with a method 
first outlined by Dix and Hallpike 23 and more recently by 
Mohr. 29 The head-hanging maneuver begins with the patient 
in a sitting position, with gaze fixed on the examiner’s fore¬ 
head (Figure 53-1). The examiner firmly grasps the patient’s 
head and has the patient quickly lie supine, with the head 
turned about 30 degrees to one side and about 30 degrees 
below the level of the examining table. Next, the patient sits 


up, and the maneuver is repeated with the head turned to the 
opposite side. In 1979, Baloh et al 38 observed that if the 
maneuver was performed slowly (during a period of 20 sec¬ 
onds), nystagmus was not induced; thus, they recommended 
performing the position change in about 2 seconds. After 
each head-hanging maneuver, the physician should observe 
the patient’s eyes for 5 to 15 seconds to determine whether 
nystagmus has been induced. 29 Overall, it takes about 3 to 5 
minutes to explain the head-hanging maneuver to the 
patient, to perform the position changes, and to observe for 
nystagmus. 

Benign paroxysmal positional vertigo is the most common 
cause of vertigo 7,8 and can usually be suspected on the basis of 
the medical history alone. Features of this syndrome include 
vertigo that occurs only with positional changes and an asso¬ 
ciated positional nystagmus that is usually rotatory, with a 
vertical or horizontal component. Also, the nystagmus usu¬ 
ally begins 5 to 15 seconds after the head-hanging maneuver, 
lasts 2 to 30 seconds, and, if the patient is repeatedly returned 
to the provocative position, occurs less and less until it can¬ 
not be induced. 23,29 Positional nystagmus cannot always be 
elicited in a patient with a history otherwise compatible with 
the diagnosis of benign paroxysmal positional vertigo. 39 ' 41 Its 
occurrence during a head-hanging maneuver occasionally 
makes a vague description of dizziness clearer. Rarely, 
patients with central nervous system lesions may present 
with positional vertigo and nystagmus and with no other 
neurologic abnormality. 42 Patient 2 in the clinical scenarios 
had benign paroxysmal positional vertigo. 

Learning how to check for positional nystagmus usually 
requires practice. Always explain to the patient what you are 
going to do before performing a head-hanging maneuver. 
Specifically, ask the patient to keep the eyes open if he or she 
becomes vertiginous; many patients close their eyes if vertigo 
develops. The head-hanging maneuver should be performed 
quickly but not so rapidly as to injure the patient. Be obser¬ 
vant because the nystagmus may last only a few seconds. 

Accuracy of the Symptoms and Signs of Vertigo 

Data are available on 3 clinically relevant questions about the 
accuracy of the clinical examination in patients with vertigo. 

1. Can positional nystagmus identify patients with benign 
paroxysmal positional vertigo? The answer is, not very 
well. Only 198 of 255 patients with positional vertigo 
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Table 53-3 Accuracy of Signs and Symptoms for Detecting Serious Causes of Dizziness in an Emergency Department 3 



No. of Patients With 
Serious Causes of 
Dizziness 11 

No. of Patients With 
Nonserious Causes 
of Dizziness 

Total 

Predictive Value, % 

Likelihood 

Ratio 

Absence of vertigo, age >69 y, or neurologic deficit 

33 

50 

83 

Positive 40 (33/83) 

1.5 

Presence of vertigo, age <69 y, and no neurologic deficit 

5 

37 

42 

Negative 88 (37/42) 

0.3 

Total 

38 

87 

125 

C 



“Data from Herretal . 5 

b Serious causes of dizziness include medication adverse effects, seizures, stroke, and cardiac arrhythmia. 
“Ellipses indicate not applicable. 


examined in a dizziness clinic had positional nystagmus 
during initial and subsequent examinations (sensitivity, 
78%). 39 In an epidemiologic study of positional vertigo, 
only 13 of 26 patients tested had positional nystagmus 
(sensitivity, 50%). 41 

2. Can matutinal vertigo distinguish peripheral causes from 
central causes of vertigo? Again, the answer is, not very 
well. In a study of 100 neurology patients (48 of whom 
had matutinal vertigo), matutinal vertigo had a sensitivity 
of 51% and a specificity of 69% for peripheral disorders, 43 
and in an epidemiologic study, symptoms of vertigo when 
rolling over in bed generated a sensitivity of 40% for 
benign paroxysmal positional vertigo. 41 

3. Can any set of symptoms and signs distinguish urgent 
causes from nonurgent causes of dizziness? Symptoms 
and signs can help identify patients in need of an urgent 
evaluation, as shown in Tables 53-2 and 53-3, which are 
from a study of 125 emergency department patients with 
the complaint of dizziness. 5 Patients who had the highly 
specific cluster of positive results on the head-hanging 
test and either vertigo or vomiting almost always had a 
nonurgent peripheral vertigo (a finding with high speci¬ 
ficity, if positive, tends to rule in the target disorder). In 
Table 53-3, the high sensitivity (87%) of the absence of 
vertigo or age older than 69 years or the presence of a 
neurologic deficit for a serious cause of dizziness meant 
that younger patients with vertigo but no neurologic 
deficit were unlikely to have an urgent cause of dizziness 
(a finding with high sensitivity, if negative, tends to rule 
out the target disorder). 

These reassuring results of the accuracy of the clinical 
examination come from a single study in an emergency 
department with rates of peripheral vertigo and serious dis¬ 
ease characteristic of such settings; they need independent 
confirmation in different settings. Although the nonurgent 
causes of dizziness may not require immediate hospitaliza¬ 
tion, some of the causes of peripheral vertigo (eg, acoustic 
neuroma) deserve further diagnostic study. 

THE BOTTOM LINE 

The following are our recommendations on useful symp¬ 
toms and signs in the evaluation of patients with dizziness: 


1. In patients with suspected vertigo, ask whether they have 
dizziness when changing body position (rolling over in 
bed, looking up at the ceiling, or bending over to tie shoe¬ 
laces) and perform a head-hanging maneuver to check for 
positional nystagmus. 

2. In combination with other data (including a brief neuro¬ 
logic examination) in an emergency department setting, 
the presence of positional nystagmus can be useful when 
evaluating for serious causes of dizziness. 
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UPDATE: Vertigo 



Prepared by David L. Simel, MD, MHS 

Reviewed by David A. Froehling, MD, 
and Richard Bedlack, MD, PhD 


CLINICAL SCENARIO 


A 58-year-old healthy man presents with dizziness. One 
week ago, he had an upper respiratory illness consisting of 
a slight fever, cough, and rhinorrhea. During the previous 
2 days, he has had 3 episodes of extreme unbalance lasting 
less than 3 to 4 minutes, when he felt as if he were “drunk.” 
During these episodes, he felt nauseated, which caused 
him to lie down and close his eyes until the symptoms 
resolved. He has had no hearing loss. Your neurologic 
examination reveals no focal findings in the cranial or 
peripheral nerves. 

Original Review 

Froehling DA, Silverstein MD, Mohr DN, Beatty CW. Does 
this dizzy patient have a serious form of vertigo? JAMA. 
1994;271(5):385-388. 

UPDATED LITERATURE SEARCH 

The focus of the original Rational Clinical Examination 
article and this update is on the vestibular disorders charac¬ 
terized by true vertigo. True vertigo creates a sensation of 
rotation. Although the initial publication approached ver¬ 
tigo from a general perspective, we sought to find updated 
information on the diagnosis of benign positional vertigo, 
the most common cause of vertiginous symptoms. We used 
the search terms “vertigo/di,” “exp dizziness,” and the text 
words “$Hallpike,” “Eply,” or “benign positional vertigo” to 
identify English-language articles on vertigo in adults, pub¬ 
lished between 1993 and November 2004. After excluding 
case reports, letters, and general reviews, we were left with 
154 articles. These were searched to identify studies using 
prospective data collection and that reported the sensitivity, 
specificity, or predictive values of clinical findings in 
patients who presented to their physician with complaints 
of dizziness. A systematic review 1 evaluated the distribution 
of diagnoses among patients with dizziness. A second gen¬ 
eral systematic review 2 without any quantitative formal 
research question provides a useful reference list for clinical 
descriptions of the common causes of vertigo. We found 1 
additional article that prospectively evaluated patients in a 


clinical population, using a patient questionnaire for diag¬ 
nosing vertigo. 

NEW FINDINGS 

• The response to the Dix-Hallpike maneuver serves as a rea¬ 
sonable reference standard for benign positional vertigo 
because it identifies patients who will respond to canalith 
repositioning maneuvers. 

• Hearing loss, part of the examination of the dizzy patient, 
has been reviewed in The Rational Clinical Examination 
series and can be assessed with the whispered voice test. 3 

Details of the Update 

Patients with dizziness may have a variety of disorders so that 
diagnosing benign positional vertigo requires an understand¬ 
ing of its overall incidence in relation to other etiologies. 
Peripheral vestibular disorders are the most common causes 
for dizziness (about 40% of patients with dizziness), of which 
benign positional vertigo and vestibular neuronitis are the 
most frequent diagnoses. Retrospective studies tend to find a 
higher incidence of benign positional vertigo than those that 
enroll dizzy patients prospectively. 

Clinicians (and patients) may be overly concerned with 
brain tumors when there is a new symptom of vertigo, but 
the likelihood that a dizzy patient without hearing loss will 
have a cerebellopontine angle mass responsible for the symp¬ 
toms is low (probability, 1 x 10 -4 ). 4 Among patients with diz¬ 
ziness associated with asymmetric hearing loss, a clinician 
would need to perform 638 scans to detect 1 cerebellopontine 
angle mass (compared with 9307 scans for dizzy patients 
without hearing loss). Thus, the approach to clinical diagno¬ 
sis should more appropriately focus on attempts to rule in 
less serious causes of vertigo (eg, benign positional vertigo), 
rather than an initial effort to rule out serious causes such as 
tumors. 

We found a systematic review 1 that identified 2 retrospective 
studies suggesting that the clinical history alone allows proper 
diagnosis of 69% to 76% of dizzy patients. We also found a pro¬ 
spective study in a small group of patients referred to an oto¬ 
laryngologist where patient history was collected through a 
questionnaire. 5 The questionnaire directs the clinician to the 
more common causes of vertigo and would have allowed correct 
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Box 53-1 Establish the Initial Diagnosis After Understanding the 
Patient’s History 

Patient Symptoms 

Initial Diagnosis 

No hearing loss + 
episodic vertigo 

Benign positional vertigo 

No hearing loss + 
persistent vertigo 

Vestibular neuronitis 

Hearing loss + 
episodic vertigo 

Meniere disease 

Hearing loss + 
persistent vertigo 

Labyrinthitis 


categorization of 61% of the patients with true vertigo accord¬ 
ing to whether they had episodic (<5 minutes, 5 minutes to 
24 hours, 1 day to 1 week) vs persistent vertigo (>1 week) and 
hearing loss or no hearing loss. See lox 53-1 . 

The questionnaire requires validation in a much larger 
population of patients and in different clinical settings 
(emergency departments and primary care clinics) because 
the patient may not belong clearly in one category, requiring 
clinical judgment. However, the questions do provide a rea¬ 
sonable paradigm for the initial line of questioning for the 
vertiginous patient. 

Once the medical history is obtained, perhaps narrowing 
the diagnosis to the most likely causes, specialists use a variety 
of clinical maneuvers. The maneuvers assess the vestibuloocu- 
lar reflex through the nystagmus response to a head thrust, 
through fixation suppression, after a headshake, through 
caloric testing, or through visual acuity during head shaking. 6 
Unfortunately, the maneuvers have not been assessed in pri¬ 
mary care clinics or emergency departments to evaluate 
whether they add information to the Dix-Hallpike during a 
patient’s initial presentation for care and before referral. 


IMPROVEMENTS IN THE DATA PRESENTED 
IN THE ORIGINAL PUBLICATION 

A systematic review provides a useful taxonomy for patients 
with disorders creating dizziness, improving the information 
provided in Table 53-1 of the original article ( ;ure 53-2). 1 
The vestibular disorders are further sorted by those that rep¬ 
resent peripheral vestibular problems (“less serious” in terms 
of the underlying etiology, though often creating a significant 
problem with activities of daily living) vs central vestibular 
disorders ( gure 53-3). 

CHANGES IN THE REFERENCE STANDARD 

The diagnosis of vestibular disorders relies on the direct obser¬ 
vation of eye movements during positional testing in a patient 
with no focal neurologic findings or central nervous system dis¬ 
ease. The clinical definition of benign positional vertigo that 
requires a positive Dix-Hallpike maneuver result is supported by 
a meta-analysis of randomized trials of canalith repositioning 
procedures. 7 The randomized trials demonstrated that, within 1 
month of treatment, patients with a positive Dix-Hallpike 
maneuver result benefit from the repositioning procedures with 
symptom resolution (number needed to treat = 3). Further¬ 
more, the positive Dix-Hallpike maneuver result returns to nor¬ 
mal at a rate similar to that of the symptom improvement. 

RESULTS OF LITERATURE REVIEW 

The Dix-Hallpike maneuver can be done in most patients, but 
some cannot tolerate it. A small study of patients with benign 
positional vertigo showed that the maneuver could be per¬ 
formed with a different motion by having the patient lie down 
on his or her side. 8 The examiner supports the head while the 
patient looks to the left at a 45-degree angle and rapidly lies 
down on the right side. The maneuver is repeated with the 
patient looking to the right and rapidly going from the sitting 
position to lying on the left side. The patient should cross the 



Figure 53-2 Dizziness Taxonomy 

a “0ther” includes drug toxicity, substance abuse, and a variety of medical illnesses. 
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arms to prevent inadvertently stopping the motion as the phy¬ 
sician helps with the maneuver. The agreement with the Dix- 
Hallpike maneuver is moderate (k = 0.60; 95% confidence 
interval, 0.32-0.89). However, patients with back or neck prob¬ 
lems may not be able to perform the side-lying maneuver any 
easier than the Dix-Hallpike maneuver. 9 A partial list of the 
absolute contraindications to either maneuver includes a his¬ 
tory of neck surgery, severe rheumatoid arthritis, cervical 
myelopathy, cervical radiculopathy, carotid syncope, neck 
trauma, or vascular diseases of the neck. 

EVIDENCE FROM GUIDELINES 

No federal guidelines address the systematic evaluation of 
dizzy patients. 


CLINICAL SCENARIO—RESOLUTION 


The patient’s clinical history is informative. He almost cer¬ 
tainly has benign positional vertigo or vestibular neuronitis 
related to his previous viral infection. A Dix-Hallpike 
maneuver result would likely be positive. No additional lab¬ 
oratory studies or radiologic imaging is necessary with this 
initial presentation of true vertigo. 



Figure 53-3 Vestibular Disorders 


VERTIGO— MAKE THE DIAGNOSIS 


PRIOR PROBABILITY 

Once the medical history confirms vertigo in a patient with 
dizziness, most affected patients will have a peripheral ves¬ 
tibular disorder (40%). The prior probability of benign 
positional vertigo among dizzy patients is 10%. 

POPULATION FOR WHOM VERTIGO 
SHOULD BE CONSIDERED 

• Benign positional vertigo should be considered only in 
patients who volunteer that they have dizziness symptoms. 

DETECTING THE LIKELIHOOD OF VERTIGO 

The medical history identifies the patient with true vertigo, 
whereas the clinical examination results identify patients 
with benign positional vertigo. The responses to the maneu¬ 
vers are not screening tests with an associated sensitivity and 
specificity because they define the diagnosis of benign posi¬ 
tional vertigo. 


REFERENCE STANDARD TESTS 

The diagnosis requires direct observation of eye movements 
during positional testing in a patient with no focal neuro¬ 
logic findings or central nervous system disease. Prospective 
clinical studies might put more weight on the observations 
by a specialist, but no comparison studies between generalist 
physicians and specialist physicians have evaluated the accu¬ 
racy of generalist clinicians. 
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TITLE Evaluating Dizziness. 

AUTHORS Hoffman RM, Einstadter D, Kroenke K. 

CITATION Am JMed. 1999;107(5):468-478. 

QUESTION What are the frequencies of various causes 
of dizziness? 

DESIGN Formal systematic review without meta-analysis. 
DATA SOURCE MEDLINE database. 

STUDY SELECTION AND ASSESSMENT The 

authors identified studies of adults with dizziness, pub¬ 
lished in English between 1966 and 1996, indexed with the 
following search terms: “dizziness” and “vertigo” with 
“vestibular function tests,” “electronystagmography,” “cal¬ 
orics,” “nystagmus,” “Barany,” “Hallpike,” “caloric testing,” 
and “brainstem auditory evoked responses.” An initial 
1755 references were identified and then filtered down to 
229 references that met the initial criteria; an additional 44 
articles were retrieved from the reference lists. The review 
was based on 12 etiology studies, 16 prognosis studies, 
and 38 studies of diagnostic tests. The studies of etiology 
used a variety of diagnostic tests. Each article was 
reviewed by 2 investigators; disagreements were resolved 
by a third person. 

MAIN RESULTS 

The clinical setting, study design, sample size, age and sex of 
patients, symptom duration, and diagnostic tests used were 
reported for the 12 etiology studies. Quality scores were not 
reported. The authors provide a framework for the taxonomy 
of the dizzy patient (see Figures 53-2 and 53-3). 

The authors report that the medical history and physical 
examination led to a probable diagnosis for dizziness in 
about 75% of patients, but the details of this assessment are 
not provided. According to 2 retrospective studies, the inves¬ 
tigators found that the diagnoses could be based on the his¬ 
tory alone in 69% to 76% of patients. Among all patients 
with dizziness, the Dix-Hallpike maneuver (suggesting 


Table 53-5 Frequency of Various Causes of Dizziness 
Disorder (n = 7 Studies) Prevalence (95% Cl) 

Peripheral vestibular disorder 0.40 (0.27-0.54) 

Central vestibular disorder 0.09 (0.06-0.13) 

Presyncope 0.09(0.06-0.13) 

Psychiatric 0.08(0.05-0.12) 

Dysequilibrium 0.03 (0.001 -0.10) 

Abbreviation: Cl, confidence interval. 

benign peripheral vertigo) was present in 16% (median), 
though the range was 7% to 44%. 

The authors did not conduct a meta-analysis of any results. 
However, the sample size and frequency of disorders are pre¬ 
sented for each etiology study. The data in repre¬ 

sent the prevalence of each disorder for the studies that were 
done with prospective data collection. The settings for these 
prospective data were primary care clinics (n = 2 studies, 240 
patients), neurology clinics (n = 2 studies, 217 patients), 
emergency departments (n = 2 studies, 218 patients), or a 
dizziness clinic (n = 1 study, 104 patients). 

Approximately 10% of all dizzy patients had benign posi¬ 
tional vertigo, whereas 11% had vestibular neuronitis. The 
frequency of other causes of true vertigo, iatrogenic causes, 
and undiagnosed dizzy patients is high and approximately 
25% to 30%. 

CONCLUSIONS 

LEVEL OF EVIDENCE Systematic review. 

STRENGTHS The systematic review included data from pri¬ 
mary care clinics, emergency departments, neurology clinics, 
and specialized dizziness clinics. The sample sizes across 
these clinics were well balanced, representing a typical spec¬ 
trum of dizzy patients. 

LIMITATIONS No quality scores or formal methodologic 
assessments were reported, though the study design (retro¬ 
spective vs prospective) is reported. The review required that 
studies have a reference standard for diagnostic tests, but the 
reference standard that was used is not reported. The authors 
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acknowledge that there is no objective reference standard for 
most causes of dizziness. 

This systematic review provides a useful taxonomy for the 
dizzy patient. By combining the estimates for the prospective 
studies only, we find that about 50% of dizzy patients had ves¬ 
tibular disorders. This is compatible with the frequency reported 
in nonsystematic reviews. Peripheral vestibular disorders 
include patients with benign positional vertigo, vestibular neu¬ 
ronitis, Meniere disease, and true vertigo of unknown cause. 
About 10% of dizzy patients will have benign positional vertigo, 
and a similar number will have vestibular neuronitis. 

Reviewed by David L. Simel, MD, MHS 


TITLE A Practical Assessment Algorithm for Diagnosis 
of Dizziness. 

AUTHORS Kentala E, Rauch SD. 

CITATION Otolaryngol Head Neck Surg. 2003; 128( 1): 
54-59. 

QUESTION Does a simple questionnaire do as well as a 
clinician for diagnosing the cause of vertigo? 

DESIGN Prospective, nonconsecutive patients. 

SETTING Otolaryngology clinic with a specialist in ver¬ 
tigo. 

PATIENTS Fifty -seven patients (42 women and 15 
men) referred for dizziness. 

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD 

The patient-completed questionnaire followed the paradigm 
of categorizing dizzy patients presented in the original Ratio¬ 
nal Clinical Examination article on vertigo. 1 The question¬ 
naire involves first asking about the presence of self-assessed 
hearing loss and vertigo (defined for the patient as “false 
sense of motion, floating, bobbing, swaying, rocking, tilting, 
or spinning”). The patients with true vertigo assessed the 
duration of episodes as episodic (<5 minutes, 5 minutes to 
24 hours, 1 day to 1 week) or persistent vertigo (>1 week). 
The questionnaire also asked single questions to assess for (1) 
dysequilibrium (“Do you have a sense of being off balance, 
tipsy, wobbly, feeling you might fall?”); (2) presyncope (“Do 
you have a feeling you might faint, black out, or lose con¬ 
sciousness?”); or (3) psychiatric diagnosis (“Do you feel dis¬ 
connected or distanced from the world around you, feel 
panicky, or have tingling about the mouth or hands?”). 

The otolaryngologist, blinded to the patient’s self-assessed 
questionnaire results, diagnosed the patient according to the 
medical history elicited, clinical examination results, and 
results from audiometric and otoneurologic tests. The spe¬ 
cific tests and maneuvers were not reported. 


MAIN OUTCOME MEASURES 

For patients with true vertigo, the clinician’s diagnosis was 
compared with the patient’s questionnaire, categorized as 
shown in Box 53-1. 


MAIN RESULTS 

A total of 35 of the 57 patients had true vertigo. The question¬ 
naire alone would have allowed correct categorization of 61% of 
the patients with true vertigo according to whether they had epi¬ 
sodic (<5 minutes, 5 minutes to 24 hours, 1 day to 1 week) vs 
persistent vertigo (>1 week) and hearing loss or no hearing loss. 

CONCLUSIONS 

LEVEL OF EVIDENCE Level 4. 

STRENGTHS Simplified approach to recording the patient 
medical history. 

LIMITATIONS Although the clinician did not have the ques¬ 
tionnaire answers, the clinician developed the questionnaire and 
was thus aware of the study hypotheses. This incorporation bias 
may have made the questionnaire appear to work better than it 
would once generalized to other settings. The questionnaire 
requires evaluation in a primary care and emergency depart¬ 
ment setting. The details of the clinical examination and other 
tests are not provided. The sample size is small. 

Although the overall quality of the study means that the 
results cannot be applied with confidence, the questionnaire 
does provide a reasonable paradigm for the initial line of 
questioning the vertiginous patient. 

Reviewed by David L. Simel, MD, MHS 

REFERENCE FOR THE EVIDENCE 

1. Froehling DA, Silverstein MD, Mohr DN, Beatty CW. Does this patient 
have a serious form of vertigo? JAMA. 1994;271(5):385-388. 
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AAA. See abdominal aortic aneurysm 
AAP. See American Academy of 
Pediatrics 

ABCD(E) criteria for melanoma, 392, 
393 1 

abdominal aortic aneurysm (AAA), 
17-22, 25-27 

abdominal palpation for, 17,18 
asymptomatic AAA, 19-20, 21 1 
factors, affecting, 20,21 1 
method, 20, 22 
ruptured AAA, 19 
evidence from guidelines, 26 
findings for, 25-26 
univariate, 26 
likelihood ratio, 271 
literature research, 25 
methods, 18-19 
original publication data, 
improvements in, 26 
physical diagnosis of, 17-18 
importance of, 18 
prior probability, 27 
pulsatile mass of, 18 
reference standard, changes in, 

26, 27 

abdominal auscultation, for bruits 
accuracy of 

in renovascular hypertension, 31 
areas of, 30 
precision of, 31 
abdominal bruits, 35-37 
anatomic and physiologic origin of, 
29 

auscultatory characteristics of, 32 
evidence from guidelines, 36 
examination for, 30-31 
findings of, 35 
literature research, 35 
nonrenovascular causes of, 311 
original publication data, 
improvements in, 36 
presence of, 31-32 
prevalence of, 29-30 
prognosis of, 32 

reference standard, changes in, 36 


abdominal palpation, 25. See also 
palpation 

for abdominal aortic aneurysm, 17,18 
asymptomatic AAA, 19-20, 21f 
factors, affecting, 20, 21f 
method, 20, 22 
ruptured abdominal aortic 
aneurysm, 19 
abdominojugular reflux 
sensitivity, specificity, or likelihood 
ratio 

left ventricular dysfunction, 213f 
venous waveforms in, 126f 

central venous pressure assessment, 
134f 

abdominojugular reflux test, 128,134 
abduction, of thumb testing, 113 
abduction stress test, 361 
abnormal monofilament testing, 112t 
abnormal vibratory sensation, 112f 
accuracy 

of clinical examination, 1, 9 
characteristics, 4-5 
confidence interval, 12 
“good” symptom or sign, 11-12 
likelihood ratio, 9-11 
meta-analysis, 12-13 
pretest probability, 11 
“sensitivity-only” studies, 13 
ACE. See angiotensin-converting 
enzyme 

acetaminophen, 247, 343, 357 
acetylcholine receptor (AChR), 449,450 
acetylcholine receptor antibody-positive 
myasthenia gravis, 450/ 
test for, 450 

Achilles tendon reflex. See reflexes 
AChR. See acetylcholine receptor 
ACI. See acute cardiac ischemia 
ACI-TIPI. See Acute Cardiac Ischemia 
Time-Insensitive Predictive 
Instrument 

ACL. See anterior cruciate ligament 
acoustic neuromas, 711 
acoustic reflectometry, 495 
action tremor, 506, 507 
active compression test 
for labral tears, 586f 


acute blood loss 

physical signs, accuracy of, 319-320 
acute cardiac ischemia (ACI), 475 
multivariate findings for, 473-474 
Acute Cardiac Ischemia Time- 
Insensitive Predictive 
Instrument (ACI-TIPI), 

475,476 

acute chest pain, diagnosis of, 462-463, 
463/ 

acute cholecystitis, 137-143,145-147,561 
definition, 137-138 
diagnostic imaging, accuracy of, 138 
findings of, 145 
guidelines, evidence from, 146 
likelihood ratio, 147 
literature review, results of, 146 
literature search, 145 
methods, 138-139 
original publication data, 

improvements in, 146 
prior probability, 147 
reference standard, changes in, 146, 
147 

results, 139-142 
signs and symptoms, 138 
accuracy of, 141 
precision of, 140-141 
acute otitis media (AOM), in children, 
493-499, 501-503 
anatomic/physiologic origins, 494 
definition of, 494 
findings of, 501 
guidelines, evidence from, 502 
improvement of, 498-499 
likelihood ratio of, 503 1 
literature review, results of 
multivariate findings for, 502, 502f 
univariate findings for, 502 
literature search, 501 
and otitis media with effusion, 

distinguishing between, 493 
prior probability, 503 
reference standard, changes in, 503 
search strategy and quality review, 
495-496 

symptoms and signs 
accuracy of, 496-498 
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acute otitis media (AOM), in children 
( Continued ) 
elicitation of, 494-495 
precision of, 496 

acute respiratory illness, 539. See also 
pneumonia 
adenoviruses, 344 
aerobic exercise, 306 
age, 152 

as an indicator of prevalence for 
bruits, 30/ 

perimenopause, 408/ 
postmenopause, 408/ 
sensitivity, specificity, or likelihood 
ratio 

in back pain, 76f 
in breast cancer, 10If 
in carpal tunnel syndrome, 115t 
in obstructive airways disease, 161 1 
in osteoporosis, 49It 
in perimenopause, 416f 
in temporal arteritis, 654 
Agency for Health Care Policy and 
Research, 47 

Agency for Healthcare Research and 

Quality (AHRQ), 494,498, 502 
agreement 
calculation of, 8/ 

agreement beyond chance, 3,4,15,104, 
294, 333,617. See also 
agreement 

AHRQ. See Agency for Healthcare 
Research and Quality 
airflow limitation, 149-156 
clinical examination for, 155 
accuracy of, 155 
measures of, 151-152, 153,155 
pathophysiologic characteristics of, 
150-151 
signs of 

accuracy of, 153-155 
medical history, 151 
physical examination, 151-152 
precision of, 153 
symptoms of 
medical history, 151 
accuracy of, 152-153 
precision of, 152 
physical examination, 151-152 
airflow obstruction, 153 
alcohol abuse, 1-2, 39, 249. See also 
alcoholism 

CAGE questionnaire for, 2-3, 4-5, 7, 

8 f 

diagnostic standards for, 39-41 
diagnostic tests of, 41-42 
at-risk drinking, problems of 
in pregnant women, 44-45 


AUDIT questionnaire, 41,42,43t, 
43-44 

biochemical and hematologic tests, 
41-42 

CAGE questionnaire, 41, 42, 43f, 

44 

MAST questionnaire, 41,42,42f, 

44 

alcohol dependency. See alcohol abuse 
alcohol drinking, problems of, 47-52 
evidence from guidelines, 49 
findings of, 47 
literature research, 47 
original publication data, 
improvements in, 48 
prior probability, 50 
reference standard, changes in, 48, 

50 

alcohol screening 

instruments. See AUDIT questionnaire; 
AUDIT-C questionnaire; 
CAGE questionnaire; 

MAST questionnaire; 

T-ACE questionnaire; 

TWEAK questionnaire 
web resources for, 49 
Alcohol Use Disorders Identification 
Test (AUDIT), 41 
accuracy of, 43f, 43-44 
reliability of, 42 
questionnaire, 47,49, 51 1 
sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 50f 
Alcohol Use Disorders Identification 
Test, Consumption Questions 
(AUDIT-C), 49, 51f 
sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 50f 
alcohol withdrawal syndromes, 5 
alcoholism, 2-3. See also alcohol 
abuse 

algorithm-driven analyses 
for appendicitis, 54 
alignment, of knee, 361 
Alvarado clinical decision rule 

(MANTRELS), 61, 62, 62f, 63f. 
See also clinical prediction 
rules and scores 

sensitivity, specificity, or likelihood 
ratio 

in appendicitis, 63 1 
Alzheimer disease, 509 
ambulatory carotid bruit, 104-105 
amenorrhea, 552, 554 
American Academy of Family Practice, 
502 


American Academy of Neurology, 634 
American Academy of Pediatrics (AAP), 
330,498, 502 

American College of Cardiology, 211, 
430, 445 

American Heart Association, 302,430, 
445 

American Medical Association, 267 
American Psychiatric Association, 

40 

American Society of Clinical 
Oncologists 

policy statement, on genetic testing, 
266 

American Thoracic Society, 155 
Amoss sign, 396 
amoxicillin, 515, 516, 517, 523 
amphetamine withdrawal, 249 
ampicillin, 515, 516, 517 
Amsel criteria, 694t 

sensitivity, specificity, or likelihood 
ratio 

for vaginitis, 707f, 694f 
anabolic steroids, 249 
anatomic origin, of abdominal bruit, 

29 

aneroid instruments, 305, 306 
angina pectoris, 462 
grading of, 462 1 
unstable, 462 

angiotensin-converting enzyme (ACE) 
inhibitor, 35,133,179,183 
anhedonia, 249 

ankle dorsiflexion. See strength testing 
ankle dorsiflexor, 78 
ankle edema 

history of, and ascites, relationship 
between, 5-6 

sensitivity, specificity, or likelihood 
ratio 

ascites, 6,6/, 7, 7/ 68f, 73f 
ankle plantar flexion, 79 
ankle reflexes, 78 
ankle swelling. See ankle edema 
ankylosing spondylitis, 77 
anorexia 

sensitivity, specificity, or likelihood 
ratio 

in acute cholecystitis, 140f 
in appendicitis, adult, 57 1 
in temporal arteritis, 648f 
anterior apprehension test 
for labral tears, 586f 
anterior cruciate ligament (ACL), 358, 
359, 361,362 
physical examination 
accuracy of, 363f 
maneuvers, 363f 
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anterior drawer test, 360/, 361 
sensitivity, specificity, or likelihood 
ratio 

knee ligament and meniscus injury, 
359/, 370f 

anterior Q waves, 187,188 
anterior release test 
for shoulder instability, 581/, 5851, 
589, 590 

sensitivity, specificity, or likelihood 
ratio 

for shoulder instability or labral 
tear, 5851, 5911 
anterior slide test 
for labral tears, 5861 
anticholinesterase test 
sensitivity, specificity, or likelihood 
ratio 

myasthenia gravis, 4601 
anticholinesterase tests, for myasthenia 
gravis, 451-453, 455 
anticoagulant therapy, 235, 561, 562 
antidepressants, 247, 249 
antihypertensive therapy, 174 
antimicrobial therapy, 395 
AOM. See acute otitis media 
aortic regurgitation (AR), 419, 4301 
anatomic and physiologic origins of, 
419 

cardiac auscultation, 420-421 
causes, 4201 

evidence from guidelines, 430 
examination, 419-420,425 
features, 421/ 
findings of, 429 
likelihood ratios, 4301 

of physical examination, 4291 
literature review, 429-430 
literature search, 429 
maneuvers, 421 
original publication data, 

improvements in, 429 
in patients with renal failure, 425 
peripheral hemodynamic signs, 421- 
422 

physical examination 
accuracy of, 423 
precision of, 422-423 
physical examination signs in 
diagnosis, 430 
prior probability, 430 
reference standard, changes in, 429, 
430 

aortic stenosis (AS), 437,445,446 
physical examination 
accuracy of, 4381,4451 
likelihood ratio, 4461 
apical impulse, 187, 190 


Apley compression test, 360/, 361 
sensitivity, specificity, or likelihood 
ratio 

in knee ligament and meniscus 
injury, 360/, 3641 
appendectomy 
delayed, 62 
unneeded, 62 

appendiceal anatomy, of appendicitis, 

54 

appendicitis, 53-58, 61-63 
appendiceal anatomy of, 54 
diagnostic modalities, accuracy of, 54 
findings of, 61-62 
likelihood ratio, 63 
literature research, 61 
original publication data, 
improvements in, 62 
pathophysiology of, 54 
prior probability, 63 
reference standard, changes in, 62, 63 
symptoms and signs, 54-55 
accuracy of, 56-57 
precision of, 55 
apprehension test 
for shoulder instability, 581/ 583, 

5851 

sensitivity, specificity, or likelihood 
ratio 

for shoulder instability, 5851 
for labral (shoulder) tear, 5861 

AR. See aortic regurgitation 
arm drift 

sensitivity, specificity, or likelihood 
ratio 

in stroke, 6411 

arm span-height difference test 
for occult vertebral fracture, 479, 482- 
483 

arrhythmias, 304 
arterial blood gas analysis, 561 
arterial bruit, compared to venous hum, 
2921 

AS. See aortic stenosis 
ascites, 1,2, 65, 71 

and ankle swelling, relationship 
between, 5-6 

evidence from guidelines, 72 
example, 65 
findings of, 71 

history and symptoms, accuracy of, 
67, 681 

information, 71-72 
literature search, 71 
original publication data, 
improvements in, 71 
pathophysiology of, 66 
physical examination, 66 


likelihood ratios for, 691 
sensitivity and specificity of, 681 
physical signs, 721 
prior probability, 73 
reference standard, changes in, 66,71, 
73 

signs, 66-67, 731 
accuracy of, 68-69 
precision of, 67-68 
symptoms, 66-67, 711, 731 
aspirin, 183 

asthma, 150,152,166,202, 203, 205, 
206, 549 

asymmetric skin lesion. See ABCD(E) 
criteria 

asymptomatic abdominal aortic 
aneurysm 

abdominal palpation for, 19-20,211 
asystole, 452 
atherosclerosis 
abdominal bruit in, 30, 32 
atrial fibrillation, 304 
at-risk drinking, 47 
problems of 

in pregnant women, 44-45 
atypical (dysplastic) nevi, 384 
AUDIT. See Alcohol Use Disorders 
Identification Test 

AUDIT-C. See Alcohol Use Disorders 
Identification Test, 
Consumption Questions 
auscultation, 528 

for airflow limitation, 151,153,154- 
155 

auscultatory characteristics, of 
abdominal bruits, 32 
auscultatory percussion, 66 
sensitivity, specificity, or likelihood 
ratio 

for ascites, 721 
autoimmune disorders, 249 
aztreonam, 517 


bacampicillin, 517 
bacterial pneumonia, 540 
balloon angioplasty, 35 
barium enemas 
for appendicitis, 54 
barrel chest sign, 153-154 

sensitivity, specificity, or likelihood 
ratio 

in obstructive airways disease, 154f 
BCDDR See Breast Cancer Detection 
Demonstration Project 
BDI. See Beck Depression Inventory 
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Beck Depression Inventory (BDI), 250, 
251. See also clinical prediction 
rules and scores 
bedside ultrasonography 
testing, for acute cholecystitis, 145, 
146 

benign moles, 383 

benign paroxysmal positional vertigo, 
711,712 

benzyl penicilloyl, 520 
benzylpenicillin, 517 
(3-blocker, 133,506 

(3-hemolytic streptococcal pharyngitis, 
615,616,617,624, 625 
likelihood of, 625 
(3-lactam antibiotics 
penicillin, cross-reactivity with, 

518 

bias 

incorporation, 498 
reverse workup, 520 
spectrum, 141,497 
in thyroid size estimation, 

282 

verification, 16,138,141, 498, 582, 
589 

biceps load test I 
for labral tears, 586f, 590 
biceps load test II 
for labral tears, 582/, 590 
biceps load tests 
for labral tears, 59It 
biochemical and hematologic tests, 
41-42 

blood pressure (BP). See also 
hypertension 

classification of, 302f, 31 If 

diastolic, 301, 302 

errors in measurement, 303t-304f 

indirect 

vs direct blood pressure, 304-305 
technical inaccuracies of, 305 
measurement, 302f 
office 

factors, affecting, 303-304f 
vs usual blood pressure, 305 
systolic, 301, 302 
blunt abdominal trauma, 72 
BMAST. See Brief Michigan Alcoholism 
Screening Test 

BMD. See bone mineral density 
BNP. See brain natriuretic peptide 
Boas sign, 138 

bone mineral density (BMD), 477,478, 
484-485 

BP. See blood pressure 
brachial-popliteal pulse gradient (Hill 
sign), 422 


brachioradial delay 
for aortic stenosis, 43 8f 
bradycardia, 320,452 
bradykinesia, 507 
maneuvers, detecting, 507/ 
sensitivity, specificity, or likelihood 
ratio 

in Parkinson disease, 507, 514f 
bradykinin, 616 

brain natriuretic peptide (BNP), 196, 
197,203, 209 
accuracy of, 201-202 
sensitivity, specificity, or likelihood 
ratio 

left ventricular dysfunction, 213f 
Brain Resuscitation Clinical Trials 
(BRCTs), 220-221 

BRCTs. See Brain Resuscitation Clinical 
Trials 

breast cancer, 87, 99, 265, 266,267, 268, 
272. See also clinical breast 
examination 

evidence from guidelines, 100 
findings of, 99-100 
incidence of, 88f 
literature search, 99 
pathophysiology of, 99 
prior probability, 101 
reference standard tests, 101 
risk factors for, 88 

Breast Cancer Detection Demonstration 
Project (BCDDP), 88, 92 
Breast-ovarian cancer syndrome, 275 
breast tenderness, 551, 552 
Breathing Not Properly Multinational 
Study, 202 
breath sounds 

sensitivity, specificity, or likelihood 
ratio 

pneumonia, adult, 53If, 536f 
Breslow-Day test, 554 
Brief Michigan Alcoholism Screening 
Test (BMAST), 41,44 
British Hypertension Society, 312 
British Thoracic Society, 574 
bronchial lavage, 540 
bronchiolitis, 540, 541/ 
bronchitis, 527 
bronchoconstriction, 452 
bronchodilators, 149 
Brudzinski signs, 396, 399 
sensitivity, specificity, or likelihood 
ratio 

in meningitis, adult, 404f 
bruit 

abdominal, 29-32, 35-37 
carotid, 103-106 
periumbilical, 30 


systolic, 35, 36 

systolic-diastolic abdominal, 30, 31, 
32 

sensitivity, specificity, or likelihood 
ratio 

of abdominal, in renovascular 
hypertension, 3If, 37f 
of carotid, in carotid stenosis, 109/, 

not 

vascular, 29 
bulbar weakness, 451 
bulging flanks, 66 

sensitivity, specificity, or likelihood 
ratio 

ascites, 69f 


C rating, 18. See also evidence, level of 
CAGE questionnaire. See also cut down, 
annoyed by criticism, guilty 
about drinking, eye-opener 
drinks 

sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 5/ 41 f 
calcium-channel blocker, 183 
calf vein thrombosis, 227 
Canadian Cardiovascular Society, 

462 

Canadian class II angina, 183 
Canadian Hypertension Education 
Program, 312 

Canadian National Breast Screening 
Study (NBSS), 88 

Canadian Preventive Health Services 
Task Force, 490 
Canadian Task Force, 99,109 

on the Periodic Health Examination, 
18 

on Preventive Health Care, 49, 261, 
393 

for malignant melanoma, 393 
cancer, 76, 86 

family history of, 265-272,275-276 
accuracy of, 268-270 
data collection, improvement in, 
270-272 

elicitation of, 266-267 
false-negative reports, reasons for, 
270 

false-positive reports, reasons for, 
270 

findings of, 275 
guidelines, evidence from, 275 
information, collection of, 270-272 
likelihood ratio, 276 
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literature review, results of, 275 
literature search, 275 
methods, 267-268 
precision of, 268 
prevalence of, 266 
prior probability, 276 
reference standard, changes in, 275, 
276 

candidiasis, 706f, 707f, 694f, 697 1 
sensitivity, specificity, or likelihood 
ratio 

in pharyngitis, 616 1 
in vaginitis, 707 1, 694f 
capillary refill time, 318, 331 
sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, adult, 32 7t 
in hypovolemia, child, 34It 
carbapenems, 515, 518, 524 
cardiac arrest, 215 
comatose survivors of, 215-223, 
225-226 

literature search, 225 
prior probability, 226 
reference standard tests, 226 
ventricular fibrillation, 215 
cardiac bradyarrhythmias, 452 
cardiac dullness. See percussion 
cardiac ischemic chest pain, 462,463. 

See also chest pain; myocardial 
infarction 

carotid arterial pulse 
and jugular venous pulse, 

distinguishing between, 127 
carotid bruit, 103-106, llOf 
ambulatory bruit, 104-105 
auscultation, precision of, 104 
carotid artery cause, in neck, 103-104 
clinical significance, 103 
evidence from guidelines, 109 
findings of 

asymptomatic patients, 107 
symptomatic patients, 107 
likelihood ratio, 108,109,110 
literature review 

asymptomatic patients, 108-109 
symptomatic patients, 108 
literature search, 107 
original publication data, 

improvements in, 107 
preoperative bruit, 105 
prior probability, 110 
reference standard, changes in, 
107-108 

symptomatic bruit, 105 
carotid pulse, for aortic stenosis, 446f 
carotid volume, for aortic stenosis, 

446 1 


carpal tunnel syndrome (CTS), 111-117 
diagnostic standard, 112-113 
evidence from guidelines, 123 
findings of, 121-122 
history and physical examination 
accuracy of, 114-117 
precision of, 114 
importance of, 111-112 
likelihood ratio, 124 
literature review, 123 
literature search, 121 
methods, 113 
normal anatomy of, 112/ 
original publication data, 

improvements in, 122 
prior probability, 124 
reference standard, changes in, 122- 
123,124 

sensitivity and specificity of 
electrodiagnosis, 112 
signs, 113 
symptoms, 113 
case-control study, 88 
case-finding, 248 
instruments, 250 
performance, in primary care 
settings, 252-253 1 
Castell method, 607, 607/, 608 
Castell sign 

sensitivity, specificity, or likelihood 
ratio 

in splenomegaly, 612f 
cauda equina syndrome, 80 
CBE. See clinical breast examination 
CDC. See Centers for Disease Control 
and Prevention 

Center for Epidemiologic Studies 

Depression Screen (CES-D), 
250, 251, 260. See also clinical 
prediction rules and scores 
Centers for Disease Control and 

Prevention (CDC), 267, 330, 
355, 356, 405, 524, 706 
US Influenza Sentinel Providers 
Surveillance Network, 344 
Centor clinical prediction rule 
for sore throat, 619, 619/ 
sensitivity, specificity, or likelihood 
ratio 

adults, 619/ 

modified for age, 623-624, 623f 
central venous pressure (CVP), 125- 
130,133-135 

abdominojugular reflux, 134f 
abnormal, 127 

clinical assessment of, 126-128 
abdominojugular reflux test, 128 
accuracy of, 129-130 


carotid arterial pulse and jugular 
venous pulse, distinguishing 
between, 127 
jugular veins, 130 
Kussmaul sign, 128 
neck veins, position of, 126-127 
precision of, 128-129 
estimation of, 127,128/ 
findings of, 133 
guidelines, evidence from, 134 
likelihood ratio, 135 
literature review, results of, 134 
literature search, 133 
original publication data, 

improvements in, 134 
prior probability, 135 
reference standard, changes in, 134, 
135 

venous waveforms in, 126f 
cephalosporins, 515, 516, 517, 524 
cerebral infarction 
Oxfordshire classification of, 634 
cerebrospinal fluid (CSF), 395-396 
CES-D. See Center for Epidemiologic 
Studies Depression Screen 
Chadwick sign, 552, 556, 557 
for pregnancy, 55 6t 
chance agreement, 3-4 
chest hyperresonance, 154 
chest pain. See also cardiac ischemic 
chest pain; myocardial 
infarction; pain 
acute, 462-463 
cardiac ischemic, 462,463 
mechanism of, 463 
chest radiograph, 195, 202,211, 661 
accuracy of, 201 

for community-acquired pneumonia, 
528, 530, 532 

for community-required pneumonia, 
535, 536,537 
for pneumonia, 540, 548 
for pulmonary embolism, 561 
for thoracic aortic dissection 
accuracy of, 666 
sensitivity of, 667 1 
chest retractions. See retractions, 
chest 

chest x-ray. See radiographic findings 
% 2 test, 464,617 

for abdominal aortic aneurysm, 

19 

2-tailed % 2 test, 529 
chlamydia, 541 
Chlamydia pneumoniae, 344 
cholecystectomy, 39,138 
cholesteatoma, 712 
chronic bronchitis, 150,151,152 
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chronic obstructive pulmonary disease 
(COPD), 163,166,168,169, 
195,202, 203,205,206, 489, 
490 

chronic thromboembolic pulmonary 
hypertension, 235 
cigarette smoking, 151 
Cincinnati Prehospital Stroke Scale 
(CPSS), 628, 630, 631 
for stroke, 64 If 
cirrhosis, 292 

classic essential tremor, 506, 514 
clinical agreement, 3-4 
clinical assessment 
accuracy of, 9-16 
for airflow limitation, 155 
accuracy of, 155 
CAGE questionnaire 
accuracy characteristics of, 4-5 
of central venous pressure, 126-128 
abdominojugular reflux test, 

128 

accuracy, 129-130 
carotid arterial pulse and jugular 
venous pulse, distinguishing 
between, 127 
jugular veins, 130 
Kussmaul sign, 128 
neck veins, position of, 126-127 
precision, 128-129 
for clubbing 
accuracy, 168-169 
precision, 168 
for coma 
accuracy of, 219 

interobserver agreement of, 218f 
precision of, 219 

for congestive heart failure, 195-206 
accuracy of, 199, 200f 
precision of, 199 

of deep vein thrombosis, 228-229, 
235-236 

importance of, 173-174 
for internal derangement of knee, 
359-361 

function, 360-361 
inspection, 359 
palpation, 359-360 
for mitral regurgitation, 438-439 
accuracy of, 439f 
for mitral valve prolapse, 439-440 
accuracy of, 440f 
precision of, 9-16 
for spider nevi 
precision of, 3/ 
for systolic murmurs 
accuracy of, 436, 437f 
precision of, 435-436, 436f 


for thoracic aortic dissection 
accuracy of, 662f 

clinical breast examination (CBE), 87. 
See also breast cancer 
accuracy, 90-91 
bottom line, 91 
anatomic basis, 87-88 
bottom line 

priorities for research, 95-96 
resolution of scenarios, 95 
effectiveness, 88-90 
bottom line, 90 
examiner factors 
bottom line, on accuracy, 92 
duration, 91 
experience, 91-92 
techniques, 91 
methods, 88 
patient factors 
age, 92 
bottom line 
on accuracy, 92 
of suggested approach, 94 
breast boundaries, 93 
breast characteristics, 92 
cancer characteristics, 92 
duration, 94 
examiner pattern, 93 
fingers, 93-94 
inspection, 94 
issues, 94 

normal from abnormal (cancerous) 
lumps, distinguish, 95 
palpation, 92 
patient position, 93 
techniques, 94-95 
precision, 90 
bottom line, 90 

sensitivity, specificity, or likelihood 
ratio 

for breast cancer, 101f 
with mammography, 91f 
techniques, 88 
test characteristics of, 88, 90 
clinical depression, 259-263 
case-finding questionnaires for 
accuracy of, 250-254 
characteristics of, 25If 
clinical interview for 
accuracy and reliability of, 254, 255 
criterion standard diagnosis, 249 
data abstraction, 250 
definition, 247-248 

diagnostic criteria and questions, 248t 
findings of, 259 

guidelines, evidence from, 260-261 
literature review, results of, 259-260 
literature search, 259 


original publication data, 

improvements in, 259 
patients, evaluating, 248-249 
physical illness, effect of, 254-256 
prior probability, 262 
reference standard, changes in, 259, 
262 

screening, web resources for, 261 
search strategy and inclusion/ 

exclusion criteria, 249-250 
statistical methods, 250 
clinical findings 
for left-sided heart failure 
detection of, 186-187,187f 
precision of, 189 

clinical gestalt. See clinical impression 
clinical impression 
sensitivity, specificity, or likelihood 
ratio 

for acute cholecystitis, 147f 
for aortic aneurysm, 24f, 134f, 

147f 

for aortic regurgitation, 429f 
for central venous pressure, 134f 
for chronic obstructive airways 
disease, 159f 

for hypovolemia, child, general 
appearance, 34If 
for left ventricular dysfunction, 
188f, 213f 

for pneumonia, infant and child, 
548f 

for pulmonary embolus, 572f, 575f 
for valvular heart disease, 446f 
clinical interview, for depression 
accuracy and reliability of, 254, 255 
clinical prediction guide, for deep vein 
thrombosis, 230 
development of, 230-232 
clinical prediction rules and scores 
ABCD(E) criteria, for melanoma, 
392f, 393f 

Alvarado score, for appendicitis, 
adult, 63 f 

for aortic stenosis, 438f 
Beck Depression Inventory, for major 
depression, 252f 

Center for Epidemiologic Studies 
Depression (CES-D), for 
major depression, 252f 
Cincinnati Prehospital Stroke Scale, 
for stroke, 64If 

for deep vein thrombosis, 235-236, 
237-238 

Glasgow Coma Scale, in recovery 
from coma, 220f 
Malnutrition Screening Tool, for 
malnutrition, adult, 38If 
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for myocardial infarction, 476t, 468- 
469 

for osteoporosis in men, 490t 
for osteoporosis in women, 49It 
Patient Health Questionnaire (PHQ-9) 
for depression, 26It 
for dysthymia, 26It 
for pneumonia, adult, 536f 
pneumonia, infant and child, 549f 
PRIME-MD, for depression, 261 1 
of pulmonary embolism, 564-566, 
567f, 572t, 575 1 
accuracy of, 565f, 568f 
components of, 566 
validation of, 566-567 
for sinusitis, 603t, 625f 
for sore throat, 618-620, 619f 
Centor clinical prediction rule, 619, 
619/ 

Mclsaac clinical prediction rule, 
620, 620/ 

Walsh algorithm, 621/ 
subjective global assessment (SGA), 
for malnutrition, adult, 376f 
for temporal arteritis, 654 
for urinary tract infection, women, 
689f 

Wells Prediction Rule, for deep vein 
thrombosis, 246f 
Wicki model, 566f, 567 
closed fist sign, 112f 
clubbing, 163-169 
clinical examination for 
accuracy of, 168-169 
precision of, 168 
congenital, 163 
data analysis, 166 
digital, 163 
findings of, 171 
guidelines, evidence from, 172 
inspection 

general appearance, 164,165/ 
nailfold angles, 164,165/ 
palpation of, 165-166 
phalangeal depth ratio, 164-165, 
165/ 

Schamroth sign, 165,165/ 
literature review, results of, 172 
literature search, 171 
methods, 166 
original publication data, 

improvements in, 171 
pathophysiology of, 164 
prevalence in associated conditions, 
172f 

prevalence of, 162 
reference standard, changes in, 171- 
172,172t 


results 

quality of evidence, 166 
quantitative indices, 166-168 
signs of, 164 

study characteristics, 166 
symptoms of, 164 
clunk test 

for shoulder instability, 585f 
sensitivity, specificity, or likelihood 
ratio 

for shoulder instability, 585f 
cocaine, 249, 301, 660 
cog wheeling, 506. See also rigidity 
sensitivity, specificity, or likelihood 
ratio 

in Parkinsonism, 506 
cold, 527, 541/ 593, 596, 
cold water caloric testing, 217 
colon cancer, 265,267 
colors, multiple in a skin lesion. See 
ABCD(E) criteria 

coma 

clinical examination for 
accuracy of, 219t 
interobserver agreement of, 218t 
precision of, 219 
hypoxic-ischemic, 216 
methods 

likelihood ratios, 218 

search strategy and quality review, 

217- 218 

statistical methods, 218 
motor response and brainstem 
reflexes, 219-221 
pathophysiology of, 216 
physical examination of, 216-217 
postcardiac arrest, 215, 216 
search results and quality of evidence, 

218- 219 

combination chemotherapy, 228 
community-acquired pneumonia, 
adult, 527-533, 535-537 
diagnosis of 

clinical history, accuracy of, 530 
physical examination findings, 
accuracy of, 530-532 
findings of, 535 
guidelines, evidence from, 536 
likelihood ratio test for, 537 
literature review, results of, 536 
literature search, 535 
methods 
data analysis, 529 
literature search, 528-529 
quality review, of articles, 529 
multivariate findings for, 536f 
original publication data, 

improvements in, 535 


pathophysiology of, 528 
prediction of 

algorithm evaluation, 532-533 
prior probability, 537 
reference standard, changes in, 535- 
536, 537 

symptoms and signs 
elicitation of, 528 
precision of, 529-530 
compliance, 173-174. See also 
noncompliance 

clinical measures, accuracy of, 175- 
176 

Compliance Questionnaire 
Rheumatology, 180 

Comprehensive Meta-Analysis software 
(version 2.197), 236 
compression rotation test 
for labral tears, 586f 
compression ultrasonography, 230 
computed tomography (CT), 17, 18 
for appendicitis, 54 
computed tomography (CT) 
angiography 

for pulmonary embolism, 571, 572 
computed tomography (CT) scanning 
for acute cholecystitis, 138 
chest CT, for community-acquired 
pneumonia, 536 
for paranasal sinuses, 594 
computer-guided analyses 
for appendicitis, 54 
computerized genograms, 271 
confidence interval, 6,12 
congenital clubbing, 163 
congestive heart failure, dyspnea in, 
195-206 

in emergency department patients 
brain natriuretic peptide, accuracy 
of, 201-202, 203 

chest radiographs, accuracy of, 201, 
202 

clinical examination and 
investigations 
accuracy of, 199, 200t 
precision of, 199 
clinical gestalt, 199, 202 
clinician’s assessment, 204 
limitations, 204-205 
electrocardiogram, accuracy of, 
201,202 

historical items, 199, 202 
pathophysiology of, 196 
physical examination, 200-201, 202 
physiological categories and 
mechanisms of, 196f 
pulmonary diseased patients, 202 
search strategy, 196-197 
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congestive heart failure, dyspnea in 
( Continued ) 
statistical methods, 197 
study characteristics, 198,198-199f 
study quality, assessment of, 197 
study selection, 197 
symptoms, 199-200, 202 
and signs, elicitation of, 196 
COPD. See chronic obstructive 
pulmonary disease 
Cope’s Early Diagnosis of the Acute 
Abdomen, 55, 138 
Copenhagen Stroke Scale, 634 
coronary heart disease, 249 
corneal reflex. See reflexes 
Corrigan water hammer pulse, 422 
sensitivity, specificity, or likelihood 
ratio 

in aortic regurgitation, 425f 
coryza, 571, 574 
costovertebral angle tenderness 
sensitivity, specificity, or likelihood 
ratio 

urinary tract infection, 680f 
cough, 149,151,152 
in infants 

differential diagnosis of, 540 1 
sensitivity, specificity, or likelihood 
ratio 

in influenza, 355f 
in obstructive airways disease, 

152f 

in otitis media, child, 497f 
in pneumonia, adult, 530f 
in streptococcal pharyngitis, 618f 
Courvoisier sign, 138 
cover-uncover test, 451 
CPSS. See Cincinnati Prehospital Stroke 
Scale 
crank test 

for labral tears, 586 1 
crossed straight-leg raising sign (CSLR) 
sign, 78 

sensitivity, specificity, or likelihood 
ratio 

for disk herniation, 86t 
CSF. See cerebrospinal fluid 
CSLR. See crossed straight-leg raising 
sign 

CT. See computed tomography 
CTS. See carpal tunnel syndrome 
curtain sign, 451 
Cushing disease, 249 
cut down, annoyed by criticism, guilty 
about drinking, eye-opener 
drinks (CAGE questionnaire) 
accuracy of, 42 
predictive, 44 


questionnaire, 41,43f, 49, 52f 
accuracy characteristics of, 4-5 
for alcohol abuse or dependency, 
2-3,4-5, 7, 8 1 
reliability of, 42 

sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 5/, 41 f 
CVP. See central venous pressure 
cyanotic congenital heart disease, 163 
cystic fibrosis, 163 


DADS. See Duke Anxiety and 
Depression Scale 
DBP. See diastolic blood pressure 
D-dimer assay, 230,236, 238-239, 561, 
562, 563, 566, 567-568, 569, 
571,572,573,575. See also 
laboratory findings 
in deep vein thrombosis diagnosis, 
239 

high-sensitivity, 240,241 1 
likelihood ratio of, 573 1 
moderate-sensitivity, 239-240 
de Musset head bobbing sign, 421 
decision analytic model, 352 
deep vein thrombosis (DVT), 227-232, 
245-246 

clinical assessment of, 228-229,235- 
236 

clinical prediction guide, 230 
development of, 230-232 
clinical prediction rules, 235-236 
data extraction, 236 
D-dimer testing for, 239 
high-sensitivity, 240, 24It 
moderate-sensitivity, 239-240 
diagnosis of, 228,232 
likelihood ratio, 246 
objective assessment of, 229-230 
original data publication, 

improvements in, 245 
prevalence of, 238/ 
prior probability, 246 
reference standard tests, 246 
search strategy, 228 
statistical analysis, 236-237 
study identification, 236 
study selection, 236 
symptoms and signs, frequency of, 
229t 

ultrasonography testing for, 

240-241 
dehydration, 315 
in children, 329-330 


anatomic and physiologic origins 
of, 330 

evidence from guidelines, 340 
examination signs, precision of, 
333-335 

findings of, 339-340 
laboratory tests, 335 
likelihood ratio, 341 
limitations, 335-336 
literature review, 340 
literature search, 339 
methods 

search strategy and quality 
review, 331-332 
statistical analyses, 332-335 
original publication data, 
improvements in, 340 
prior probability, 341 
reference standard tests, 341 
symptoms and signs, 330-331 
accuracy of, 333 
precision of, 333 

Dehydration Assessment Scale, for 
hypovolemia, child, 334f 
delayed menses, for pregnancy, 560 
depressed mood, perimenopausal, 

409 

Depression Scale (DEPS), 250, 251 
DEPS. See Depression Scale 
DerSimonian-Laird random-effects 
method, 509 

deviated nasal septum, 594, 595 
diabetes mellitus, 249 
Diagnostic and Statistical Manual of 
Mental Disorders (Third 
Edition Revised) ( DSM-III-R ), 
40, 259 

Diagnostic and Statistical Manual of 
Mental Disorders (Fourth 
Edition) ( DSM-IV ), 48, 247, 
249, 259 

diagnostic odds ratio (DOR), 12,13 
diaphoresis 

sensitivity, specificity, or likelihood 
ratio 

in myocardial infarction, 467 1 
diastolic blood pressure (DBP), 301, 
302, 303 

diastolic dysfunction, 184 
and systolic dysfunction, difference 
between, 189, 211 
diastolic murmur. See aortic 
regurgitation 
digital clubbing, 163 
diplopia, 455 

sensitivity, specificity, or likelihood 
ratio 

temporal arteritis, 656 
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dipstick urinalysis 
for urinary tract infection 

accuracy of, 678 
direct blood pressure 
vs indirect blood pressure, 304-305 
diuretic, 179 
therapy, 184 

Dix-Hallpike maneuver, 710/, 715, 716, 
717 

dizziness. See also vertigo 
sensitivity, specificity, or likelihood 
ratio 

postural, in hypovolemia, adult, 
327 t 

Doppler echocardiography, 430 

Doppler effect, 553 

DOR. See diagnostic odds ratio 

drink, 41, 48 

dry axilla 

sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, adult, 327f 
dry mucous membranes 
sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, adult, 327f 

in hypovolemia, child, 34It 
DSM-III-R. See Diagnostic and 

Statistical Manual of Mental 
Disorders (Third Edition 
Revised) 

DSM-IV. See Diagnostic and Statistical 
Manual of Mental Disorders 
(Fourth Edition) 

Duke Anxiety and Depression Scale 
(DADS), 250,251 

Duroziez double intermittent femoral 
bruit, 422,425f 

sensitivity, specificity, or likelihood 
ratio 

in aortic regurgitation, 425f 
DVT. See deep vein thrombosis 
dyskinesias, 506 

dyspnea, 125,149,152,153,183,186, 
187,215, 225 

in congestive heart failure, 195-206 
sensitivity, specificity, or likelihood 
ratio 

in congestive heart failure, 200t 

in pneumonia, adult, 530f 
dyspnea on exertion. See dyspnea 
dysthymia, 248 
dysuria 

sensitivity, specificity, or likelihood 
ratio 

in urinary tract infection, women, 
689 1 

in vaginitis, 707f 


ear rubbing 

sensitivity, specificity, or likelihood 
ratio 

in otitis media, child, 503 1 
ECG. See electrocardiogram 
echocardiogram, 209 
for left ventricular systolic 

dysfunction, 210-211, 212 
edema, 187, 617 

sensitivity, specificity, or likelihood 
ratio 

in left ventricular dysfunction, 200f 
edrophonium chloride, 451-452 
edrophonium test, 452,453f. See also 
anticholinesterase test 
effectiveness score, 250,253 
effusion, 359-360 
egophony, 528 

ejection fraction, detection of, 187-189 
electrocardiogram (ECG), 202 
accuracy of, 201 

left bundle-branch block on, 187 
for myocardial infarction 
accuracy of, 468 
precision of, 465-466 
for pulmonary embolism, 561-562 
sensitivity, specificity, or likelihood 
ratio 

for left ventricular dysfunction, 
213f 

for thoracic aortic dissection, 665 1 
for myocardial infarction, 468f 
ELISA. See enzyme-linked 

immunosorbent assay 
ELISA Vidas DD, 573 
emphysema, 150 
endemic iodine deficiency, 285 
endometrial cancer, 265, 267 
enhanced ptosis. See curtain sign 
enlarging skin lesion. See ABCD(E) 
criteria 

enzyme-linked immunosorbent assay 
(ELISA), 230, 239,572 
epiglottitis, 540, 541/ 
erythema, 617 

erythrocyte sedimentation rate (ESR), 
649 

erythromycin, 524 

ESR. See erythrocyte sedimentation rate 
estradiol, perimenopausal, 408, 410 
ethmoid sinus, 594, 595, 595/ 
ethnicity 

sensitivity, specificity, or likelihood 
ratio 

in osteoporosis, 49If 
in perimenopause, 416f 


in thoracic aortic dissection, 
enlarged aorta or wide 
mediastinum, 673f 
European Influenza Surveillance 
Scheme, 344 

European Society of Cardiology, 212 
European Society of Hypertension, 312 
Evaluation du Scanner Spirale dans 
l’Embolie Pulmonaire study 
group, 564 
evidence, level of, 15f 
expected agreement. See agreement 
Extended Wells scoring system, 571, 573 
extraocular muscles, asymmetric 
weakness of, 451 

eye movements, in coma, 220f. See also 
reflexes 


facial paresis 

sensitivity, specificity, or likelihood 
ratio 

in stroke, 64It 
facial weakness, 451,455 
family history 

sensitivity, specificity, or likelihood 
ratio 

in cancer, 275f 
in early menopause, 417f 
female. See sex 

femoral pistol shot murmur, 422,425t 

femur, 358, 359, 360 

fever 

sensitivity, specificity, or likelihood 
ratio 

in acute cholecystitis, 140f 
in appendicitis, adult, 571 
in influenza, 356f 
in meningitis, adult, 404f 
in otitis media, child, 497f 
in pneumonia, adult, 536f 
in pneumonia, infant and child, 
550 1 

in streptococcal pharyngitis, 

618f 

in temporal arteritis, 648 1 
in urinary tract infection, women, 
679 1 

fibromuscular hyperplasia 
abdominal bruit in, 30 
finger-flicking percussion, 66-67 
Fisher exact test, 529 
flank dullness, 66,67. See also 
percussion 

flexicurve measurement, 479, 485 
flick sign, 112f 
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Flint murmur, 420-421 
in aortic regurgitation, 420 
fluctuating weakness. See reduced 
muscle power 

fluid loss, subjective global assessment, 
374-375 

fluid wave, 66, 67. See also percussion 
follicle-stimulating hormone (FSH), 

perimenopausal, 408,410,416, 
4171 

fontanelle, sunken 
sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, child, 334f 
Food and Drug Administration, 197, 
416 

forced expiratory time, 151,1591 

in obstructive airways disease, 1611 
Fracture Intervention Trial, 482 
fractures, spinal compression, 77 
Framingham study, 463 
frequent urination 

sensitivity, specificity, or likelihood 
ratio 

in urinary tract infection, women, 
6891 

frontal sinus, 594/, 595, 595/ 
surface palpation for, 596/ 

FSH. See follicle-stimulating hormone 


gag reflex. See reflexes 
gait 

sensitivity, specificity, or likelihood 
ratio 

in Parkinsonism 
heel to toe walking, 5141 
rising from a chair, 5141 
shuffling, 5141 
gastroenterologist, 293 
gastrointestinal (GI) 
symptoms, 374 
tract hemorrhage, 316 
GCS. See Glasgow Coma Scale 
GDS. See Geriatric Depression Scale 
General Health Questionnaire (GHQ- 
12), 260 
genetic testing 

family history assessment tools for, 
270 

policy statement on, 266 
Geneva rule, 572, 573 
Geriatric Depression Scale (GDS), 250, 
251,260 

GHQ-12. See General Health 
Questionnaire 


GI. See gastrointestinal 
girth, increased abdominal 
sensitivity, specificity, or likelihood 
ratio 

in ascites, 681, 731 

glabella tap reflex test, 508, 508/510, 
513, 514. See also reflexes 
Glasgow Coma Scale (GCS), 216, 2161, 
221. See also clinical prediction 
rules and scores 
Glasgow-Pittsburgh Cerebral 

Performance Categories, 

217 

in coma, 217 

Global Initiative for Chronic 

Obstructive Lung Disease, 

161 

glucocorticoids, 249 
goiter, 277-282,285-287 
accuracy of, 281-282 
anatomic basics of 
landmarks, 277-278, 278/ 
normal size, 278 
examination, 278-279 
bias in, 282 

false-negative results, 280 
false-positive results, 279-280 
findings of, 285 
guidelines, evidence from, 286 
likelihood ratio, 287 
literature review, results of, 286 
literature search, 285 
original publication data, 

improvements in, 285 
precision of 

interobserver variability, 280-281 
intraobserver variability, 281 
prior probability, 287 
reference standard, changes in, 
285-286, 287 
size of, 277 

Goldman chest pain protocol, 474/, 475 
“good” clinical finding, 11-12 
Goodell sign, 552 

sensitivity, specificity, or likelihood 
ratio 

in pregnancy, 556 

grades of evidence. See levels of evidence 
Graves disease, 277 
great toe extensor weakness 
sensitivity, specificity, or likelihood 
ratio 

in sciatica, 79f 
grind test 

sensitivity, specificity, or likelihood 
ratio 

for knee ligament and meniscus 
injury, 370f 


grunting 

sensitivity, specificity, or likelihood 
ratio 

in pneumonia, infant and child, 
5501 
guarding, 55 

Guide to Clinical Preventive Services, 
Third Edition, Periodic 
Updates, 47 


HADS. See Hospital Anxiety and 
Depression Scale 

Haemophilus influenzae, 400,494, 501 
hand diagram. See Katz hand diagram 
hand grip strength test 
for occult vertebral fracture, 479,485 
harmful drinking, 41,47 
Hawkins grading scheme, 579 
hazardous drinking, 41,47 
HCG test. See human chorionic 
gonadotropin test 
headache. See pain 
head-hanging maneuver, vertigo, 712 
Health Canada, 344 
Health Insurance Plan (HIP) study, 89 
HealthSTAR database, 361 
hearing loss 

in benign positional vertigo, 716 
in labyrinthitis, 716 
in Meniere disease, 716 
in vertigo, 716 
in vestibular neuronitis, 716 
heart failure, 195 
ascites, 65 
heart sounds 

sensitivity, specificity, or likelihood 
ratio 

52 (second heart sound) for aortic 

stenosis, 446f 

53 (third heart sound) 

for aortic regurgitation, 430f 
for myocardial infarction, 467t 
for the breathless emergency 
patient, 200f 

for ventricular dysfunction, 213f 

54 (fourth heart sound) 
for aortic stenosis, 438f 

for ventricular dysfunction, 200f 
heel-to-toe test, 513 
Hegar sign, 552, 553/ 
sensitivity, specificity, or likelihood 
ratio 

for pregnancy, 556 
height loss 

in osteoporosis, 482, 483f, 490f 
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hemagglutination-inhibition method, 
555 

hematuria, in urinary tract infection, 
women, 679f 
hemorrhagic stroke, 635 
heparin-induced thrombocytopenia, 
235 

hepatojugular reflux, 128 
hepatomegaly 
findings of, 299 
guidelines, evidence from, 300 
likelihood ratio, 300 
literature review, results of, 299 
literature search, 299 
original publication data, 

improvements in, 299 
prior probability, 300 
reference standard, changes in, 299, 
300 

hereditary cancer syndrome, 265,267 
hereditary nonpolyposis colon cancer 
(HNPCC), 266, 267,268 
herniated disk 

with radiculopathy in low back pain, 
86 

herpes zoster oticus, 712 
HIP. See Health Insurance Plan study 
HIV. See human immunodeficiency 
virus 

history 

of penicillin allergy, 525t 
HNPCC. See hereditary nonpolyposis 
colon cancer 
Homans sign, 228 
sensitivity, specificity, or likelihood 
ratio 

in deep vein thrombosis, 229 t 
home pregnancy test (HPT), 559, 560, 
560 1 

accuracy of, 555-556 
likelihood ratios of, 559, 560f 
Hoover sign, 542 

Hopkins Symptom Check List (HSCL), 
250, 251,252 

hormone replacement therapy (HRT), 
411 

Hospital Anxiety and Depression Scale 
(HADS), 259, 260 
hot flashes, perimenopausal, 409 
Hoyne sign, 396 
HPT. See home pregnancy test 
HRT. See hormone replacement therapy 
HSCL. See Hopkins Symptom Check 
List 

human chorionic gonadotropin (HCG) 
test, 553, 554, 555, 557 
human immunodeficiency virus (HIV), 
611 


humped back, in osteoporosis, 48It 
hydrochlorothiazide, 316 
hypalgesia, 112f 
hypernatremia, 316 
hypertension, 183, 301-307. See also 
blood pressure 
classification of, 311 
diagnosis of 
guidelines for, 301-302 
potential improvements in, 306- 
307 

findings of, 311 

guidelines, evidence from, 312-313 
likelihood ratio, 313 
literature research, results of, 312 
literature search, 311 
measurement of 
accuracy of, 304 
techniques for, 302f 
variation in, 303-304 
original publication data, 

improvements in, 311-312 
prediction, issue of 
blood pressure now vs blood 
pressure later, 305-306 
palpation, 306 
relative risk, 306 
prior probability, 313 
reference standard, changes in, 312, 
313 

sensitivity, specificity, or likelihood 
ratio 

in left ventricular dysfunction, 211 1 
in thoracic aortic dissection, 666f 
hyperthyroidism, 277 
hypertrophic cardiomyopathy, 439 
hypertrophic osteoarthropathy, 163,164 
hypotension 

in myocardial infarction, 467f 
postural, 318, 319 
supine, 320 

hypothyroidism, 249,277 
hypovolemia, 315-316 
acute blood loss 

physical signs, accuracy of, 319-320 
clinical study, 317f 
findings of, 325 
likelihood, 327 1 
in ICU patients, 326 t 
literature review, 326 
literature search, 325 
methods, 316-317 
multivariate findings for, 326 
pathogenesis, 318-319 
physical signs 
accuracy of, 320-321 
precision of, 319 
postural vital signs, 317-318 


prior probability, 327 
reference standard, changes in, 325, 
327 


ICD-10. See International Statistical 

Classification of Diseases, 10th 
Revision 

ice pack test, 451,454 
precision of, 455 

sensitivity, specificity, or likelihood 
ratio 

for myasthenia gravis, 460f 
ICS. See Innsbruck Coma Scale 
idiopathic dilated cardiomyopathy, 183 
idiopathic penicillin hypersensitivity, 
517 

IgE antibodies, 519 
imipenem, 517 

immediate penicillin hypersensitivity, 
516-517 

impedance plethysmography, 229-230 
incorporation bias, 498 
increased abdominal girth. See girth 
incontinence 

sensitivity, specificity, or likelihood 
ratio 

in perimenopause, 412f 
independence, 13-14 
indirect blood pressure 
vs direct blood pressure, 304-305 
technical inaccuracies of, 305 
inelastic skin, 331 
infiltrative disorders, 292 
influenza, 343-344, 355-356 
clinical findings, 347f 
diagnosis test, 350-352 
likelihood of, 356 
methods 

diagnostic odds ratio, 346 
search strategy and quality review, 
344-346 

statistical methods, 346 
prior probability, 356 
reference standard tests, 356 
signs and symptoms 
accuracy of, 346-350 
precision of, 346 

Innsbruck Coma Scale (ICS), 221 
Integrated Management of Childhood 
Illness Scale, 330 
intention tremor, 506 
interleukin 1, 616 
interleukin 6, 616 

internal rotation resistance strength test 
for labral tears, 582 f, 586 1 
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International Registry of Acute Aortic 
Dissection (IRAD), 671 
International Statistical Classification of 
Diseases, 10th Revision ( ICD- 
10), 40, 249 

ipsilateral straight-leg raising sign, 78 
IRAD. See International Registry of 
Acute Aortic Dissection 
irregular border skin lesion. See 
ABCD(E) criteria 
irritability, perimenopausal, 409 
ischemic stroke, 635 
subtype analysis, 635-636 
accuracy of, 635 
reliability of, 635-636 
transient, 627, 631-633 
itching 

sensitivity, specificity, or likelihood 
ratio 

in vaginitis, 707 1 


jaw claudication, in temporal arteritis, 
656 1 

JNC-VII. See joint National Committee 
on Prevention, Detection, 
Evaluation, and Treatment of 
High Blood Pressure, seventh 
report of 

joint line tenderness, 360. See also pain 
Joint National Committee, 305 
Joint NationaJ Committee on 
Prevention, Detection, 
EvaJuation, and Treatment of 
High Blood Pressure, seventh 
report of (JNC-VII), 312 
jolt accentuation of headache, 396,400 
in meningitis, adult, 404f 
jugular veins, 190 
jugular venous distention, 186,187 
sensitivity, specificity, or likelihood 
ratio 

in left ventricular dysfunction, 

2131 

jugular venous pressure (JVP), 125,133, 
134 

anatomic and physiologic origins of, 
125-126 

jugular veins, clinical examination of, 
130 

waveforms, analysis of, 126 
abnormalities, 126 
jugular venous pulse 

and carotid arterial pulse, 

distinguishing between, 127 
JVP. See jugular venous pressure 


K statistic, 152,153. See also precision 
calculation of, 8/ 
weighted, 571 
Kartagener syndrome, 594 
Katz hand diagram, 113,114/ 
sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome, 115f, 
123f, 124 

Kernig signs, 396, 399 
sensitivity, specificity, or likelihood 
ratio 

in meningitis, adult, 404f 
knee effusion 

sensitivity, specificity, or likelihood 
ratio 

in knee ligament and meniscus 
injury, 370f 

knee, meniscal, and ligamentous 
injuries, 357-358 
anatomy, 358 

anterior cruciate ligament (ACL) 
examination, 362 
physical examination 
accuracy of, 363t 
maneuvers, 363t 
clinical examination for internal 
derangement, 359-361 
function, 360-361 
inspection, 359 
palpation, 359-360 
epidemiology of, 359 
findings of, 369 
likelihood of, 370 
limitations, 363, 365 
literature review, 369 
literature search, 369 
mechanism, 358-359 
methods 
analysis, 362 
search strategy, 361-362 
original publication data, 

improvements in, 369 
physical examination, 366f 
accuracy of, 364f 
maneuvers, 365f 

posterior cruciate ligament (PCL) 
examination, 362 
physical examination 
accuracy of, 364f 
prior probability, 370 
reference standard, changes in, 369, 
370 

symptoms, 358 
Kussmaul sign, 128 
kwashiorkor, 372 


kyphosis, 477,478,479, 485 
in osteoporosis, 490f 


laboratory findings 

sensitivity, specificity, or likelihood 
ratio 

for acute cholecystitis, 140f 
for adult malnutrition, albumin, 
380f 

for deep vein thrombosis, D-dimer, 
246f 

for hypovolemia, adult, urine 
specific gravity, 3271 
for hypovolemia, child, 334f 
for influenza, rapid tests, 356f 
for left ventricular dysfunction, 
brain natriuretic peptide, 

213f 

for malnourishment, adult 380 1 
for perimenopause, 417f 
for pulmonary embolus, D-dimer, 
575 1 

for streptococcal pharyngitis, rapid 
streptococcal test, 625f 
for temporal arteritis, erythrocyte 
sedimentation rate, 656t 
for urinary tract infection, women, 
urinalysis, 689f 

for vaginitis, microscopic tests, 

699 1, 700 1, 707 1 

labral (shoulder) tears, 589-591 
findings of, 589 
guidelines, evidence from, 590 
likelihood ratio for, 591 
literature search, 589 
original publication data, 

improvements in, 590 
physical examination tests, 580f 
precision of 
laxity maneuvers, 590f 
provocation maneuvers, 590f 
prior probability, 591 
reference standard, changes in, 590, 
591 

labyrinthitis, 711 
Lachman test, 360, 360 f 369 
sensitivity, specificity, or likelihood 
ratio 

for knee ligament and meniscus 
injury, 359/, 370 1 

LACS. See lacunar infarction syndrome 
lacunar infarction syndrome (LACS), 
634 

laparoscopy 
for appendicitis, 54 
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LAPSS. See Los Angeles Prehospital 
Stroke Scale 
laryngeal height 

sensitivity, specificity, or likelihood 
ratio 

in obstructive airways disease, 161 t, 
162 1 

laryngitis, 540, 541/ 
laryngotracheobronchitis, 540, 541/ 
late penicillin hypersensitivity, 517-518 
lateral collateral ligament (LCL), 358 
lateral pivot shift test, 360/, 361 
sensitivity, specificity, or likelihood 
ratio 

knee ligaments and menisci, 359/ 
latex agglutination assays, 230 
LAW (lymphocyte count, albumin, 
percentage weight loss) 
discriminant model, for adult 
malnutrition, 381 
laxity tests 

for shoulder instability, 579, 580f 
LCL. See lateral collateral ligament 
lead pipe rigidity. See rigidity 
left-sided heart failure, in adults, 183 
clinical findings for 
detection of, 186-187,187f 
precision of, 189 
definition, 184 

ejection fraction, detection of, 187- 
189,188f 
methods 

data abstraction, 184-186 
literature search, 184 
pathophysiology of, 184 
signs, elicitation of 
apical impulse, 190 
jugular veins, 190 
radiographic cardiomegaly, 

190 

radiographic redistribution, 190 
third heart sound, 190 
vital signs, 189-190 
left ventricular dysfunction 
findings of, 209 

guidelines, evidence from, 211-212 
likelihood ratio, 213 
literature review, results of, 211 
literature search, 209 
original publication data, 

improvements in, 211 
prior probability, 213 
reference standard, changes in, 211, 
213 

systolic dysfunction 
diagnosis of, 210-211 
echocardiograms, 210-211 
postmyocardial infarction, 210 


and diastolic dysfunction, 
difference between, 211 
left ventricular hypertrophy, 189 
Legionella, 344 

Legionella monocytogenes, 400 
levels of evidence, 15f 
Levine grading system. See murmur 
intensity 

levodopa, 505, 506, 508 
Li-Fraumeni syndrome, 275 
likelihood ratio (LR) test, 218, 529, 

617 

for abdominal aortic aneurysm, 27 1 
for acute cholecystitis, 147 
for acute otitis media, 496, 501, 

503 1 

for aortic regurgitation, 429f, 430f 
for aortic stenosis, 446f 
for appendicitis, 63 
for (3-hemolytic streptococcal 
pharyngitis, 625 
calculation of, 8/ 
for cancer, 276 

for carpal tunnel syndrome, 124 
for central venous pressure, 135 
for chest pain protocol, acute cardiac 
ischemia, 473f 

for chest pain radiation, 471-472, 

472 1 

for clinical assessment, 9-11 
of deep vein thrombosis, 229, 231 
for community-acquired pneumonia, 
530f, 531, 531f, 537 
for deep vein thrombosis, 246 
for dehydration, 341 
for goiter, 287 

for home pregnancy test, 559, 560f 
for hypovolemia, 32 7t 
in ICU patients, 326 1 
for influenza, 356 
for labral tears, 591 
for left ventricular dysfunction, 213 
for major depression, 262 
for malnutrition, 380f, 38If 
for medication nonadherence, 182f 
for meningitis in adults, 404f, 406 
for meniscal and ligamentous knee 
injuries, 370 

for mitral regurgitation, 447f 

for myasthenia gravis, 460f 

for myocardial infarction, 476 

for obstructive airways disease, 162 

for osteoporosis, 491 

for Parkinson disease, 514 

for pediatric pneumonia, 550 

for penicillin allergy, 523 

for perimenopausal, 416f, 417f 

for Phalen sign, 124f 


for pregnancy, 560 

for pulmonary embolism, 572f, 575 

for reference standard tests, 356f 

for renal artery stenosis, 37 

for shoulder instability, 591 

for sinusitis, 603 

for splenomegaly, 613 

for stroke, 629, 641 

for temporal arteritis, 648f, 649f, 

654f, 656 1 

for thoracic aortic dissection, 6671, 
673, 673f 

for Tinel sign, 124f 
for urinary tract infection, 68It 
for vaginal complaints, 707 
for valvular heart disease, 446 1 
for vertigo, 717 
limb weakness, 451 

liver, physical examination of, 289-296 
auscultation of, 291-292 
examination of, 290-291 
inspection of, 291 
palpable liver edge, 292-293 
physical findings of, 295 
pulsatile liver edge, 293 
topography, 289-290 
vertical liver span, assessing, 293-295 
liver disease 
ascites, 65 

liver edge. See palpation 
liver span, normal, 291 1 
load and shift anterior test 
for shoulder instability, 585f 
load and shift posterior test 
for shoulder instability, 585f 
logistic analysis, 14 
Los Angeles Prehospital Stroke Scale 
(LAPSS), 628, 630-631,633 
low back pain 

anatomic/physiologic origins, 75 
causes, 75 

evidence from guidelines, 84 
findings of, 83 
history, 81 
literature review, 84 
literature search, 83 
neurologic compromise, 77-80 
cauda equina syndrome, 80 
imaging tests, indications for, 80 
lumbar disk herniations, 77-78 
motor, reflex, and sensory 

dysfunction, assessment of, 78- 
80 

spinal stenosis, 80 
original publication data, 
improvements in, 83 
physical examination, 81 
prevalence of diseases, 75-76 
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low back pain ( Continued ) 
prior probability, 86 
reference standard, changes in, 84, 86 
social or psychological distress, 80-81 
systemic disease 
ankylosing spondylitis, 77 
cancer, 76 

compression fractures, 77 
spinal infections, 77 
spine range-of-motion measures, 
77 

lower-extremity dermatomes, 79/ 
lower respiratory tract illness (LRI), 

539, 540. See also pneumonia 
LR. See likelihood ratio test 
LRI. See lower respiratory tract illness 
lumbar disk herniation, 77-78 
crossed straight-leg raise test, 
accuracy, 84t 

ipsilateral straight-leg raise test, 
accuracy, 84t 

physical examination, accuracy 
sciatica, patients with, 79f 
lumbar spine 
low back pain, 75 
lung scanning, 562 
lymphadenopathy 

sensitivity, specificity, or likelihood 
ratio 

in streptococcal pharyngitis, 
anterior cervical, 618f 
lymphocyte count, albumin, percentage 
weight loss (LAW) 
discriminant model, for adult 
malnutrition, 381 
lysosomal enzyme, 616 


major depression, 248 
likelihood ratio, 262 
malaise 

sensitivity, specificity, or likelihood 
ratio 

influenza, 356f 
male. See sex 
malignancy, 249 
ascites, 65 

malignant melanoma 
ABCD(E) criteria 
likelihood ratio, 393f 
multivariate findings, 392f 
univariate findings, 392f 
detection and prognosis, 383 
epidemiology, 383 
evidence from guidelines, 392-393 
findings of, 392 


literature review, 392 
literature search, 391 
methods 

search strategy and quality filter, 
385 

original publication data, 

improvements in, 392 
prior probability, 393 
reference standard, changes in, 392, 
393 

skin examination 
accuracy of 

ABCD(E) checklist, 385f, 385-387 
for detecting presence or absence, 
387f, 387-388 

revised 7-point checklist, 386, 
386f 

checklists as diagnostic aid, 384 
criterion standard for diagnosis, 
385 

historical feature assessment, 384 
physical examination technique, 
384 

precision of, 385 
signs and symptoms, 383-384 
skin type risk factors, 393f 
malignant neoplasm, 76. See also cancer 
malnutrition, 371. See also nutritional 
status assessment 
evidence from guidelines, 381 
findings of, 379 
likelihood ratio, 382 
of findings combinations, 380t 
of low albumin, 380t 
literature review, 380-381 
literature search, 379 
multivariate findings, 380f 
original publication data, 

improvements in, 379-380 
prior probability, 381 
reference standard, changes in, 380, 
381 

subjective global assessment, 379-380 
Malnutrition Screening Tool, 38If 
Mammacare Method, 94,95 
mammography, 89-91 
MANTRELS mnemonic, 61,62f. See 

also Alvarado clinical decision 
rule 

marasmus, 372 
Marfan syndrome, 663 
in thoracic aortic dissection, 664t 
marginal cross-products, 3 
Massachusetts Women’s Health Study, 
408 

MAST. See Michigan Alcoholism 
Screening Test 
match test, 151-152 


maxillary sinus, 594, 595, 595/, 597 
surface palpation for, 596/ 
transillumination of, 596/ 

Mclsaac clinical prediction rule 
for sore throat, 620, 620/ 

MCL. See medial collateral ligament 
McMurray test, 360/ 361, 369 
sensitivity, specificity, or likelihood 
ratio 

knee ligament and meniscus injury, 
359/, 370f 

MDM. See minor determinant mixture 
medial collateral ligament (MCL), 358, 
359, 361 

medial-lateral grind test, 361 
Medical Research Council Thrombosis 
Prevention Trial, 26 
medication adherence, assessing, 179- 
182,182f 
findings of, 179 
guidelines, changes in, 181 
literature review, results of, 180-181 
literature search, 179 
Morisky questions, 182t 
original publication data, 

improvements in, 180 
pill count, 176f 
prior probability, 182 
reference standard, changes in, 180, 
182 

Medication Adherence Self-Report 
Inventory, 180 

Medication Event Monitoring System 
(MEMS) caps, 180 
medication response 
sensitivity, specificity, or likelihood 
ratio 

to anticholinesterase, for 
myasthenia gravis, 460f 
to decongestants, for sinusitis, 603f 
to levodopa, for Parkinsonism, 51 Of 
to penicillin skin test, for penicillin 
allergy, 525f 
melanocyte, 383-384 
MEMS. See Medication Event 

Monitoring System caps 
Meniere disease, 316, 711 
meningitis, 404f 
clinical examination, 395-396 
clinical history 
accuracy of, 398 
sensitivity of, 398f 
evidence from guidelines, 405 
findings of, 403 
likelihood ratios, 404f, 406 
literature review, 404 
prospective study, 404-405 
retrospective study, 404 
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literature search, 403 
methods 
data analysis, 398 
literature search and selection, 
396-397 

study characteristics, 397-398 
original publication data, 

improvements in, 403-404 
pathophysiology of, 396 
physical examination 
accuracy of, 398, 399-400 
sensitivity of, 3991 
prior probability, 406 
reference standard, changes in, 404, 
406 

sensitivity of findings, 404t 
signs and symptoms, 396 
precision of, 398 
menopause. See perimenopause 
meta-analysis 

of clinical examination, 12-13 
methacholine, 150 
MI. See myocardial infarction 
Michigan Alcoholism Screening Test 
(MAST), 41,42,42f, 44. See 
also Brief Michigan 
Alcoholism Screening Test; 
Short Michigan Alcoholism 
Screening Test 
accuracy of, 42 
questionnaire, 49 
reliability of, 42 

sensitivity, specificity, or likelihood 
ratio 

in problem alcohol drinking, 

41 

micrographia 

sensitivity, specificity, or likelihood 
ratio 

in Parkinsonism, 509f 
Middleton hooking maneuver 
sensitivity, specificity, or likelihood 
ratio 

in splenomegaly, 612f 
Mini MagLite, 597 

Mini Nutritional Assessment (MNA), 
380-381 

Minnesota Multiphasic Personality 
Inventory, 80 
minor depression, 248 
minor determinant mixture (MDM), 
520 

mitral regurgitation (MR), 438-439,445 
clinical examination 
accuracy of, 439f 
and mitral valve prolapse, 446 
physical examination 
accuracy of, 445f 


mitral stenosis 

and pulmonic regurgitation, 423, 

425 

mitral valve prolapse (MVP), 439-440, 
445 

clinical examination 
accuracy of, 440t 
and mitral regurgitation, 446 
MNA. See Mini Nutritional Assessment 
moderate drinking, 48 
Modigliani syndrome, 280 
monobactams, 515, 518 
monofilament testing. See sensory 
change 
mood, 249 

Moraxella catarrhalis, 494 
Morisky measure, 180 
sensitivity, specificity, or likelihood 
ratio 

for medication adherence, 182f 
morning sickness, 552, 554, 555 
for pregnancy, 560 
Movement Disorder Society, 507 
MR. See mitral regurgitation 
multiple nevi, 384 
multivariate analysis, 14 
murmur 
intensity, 420 

sensitivity, specificity, or likelihood 
ratio 

in aortic regurgitation 
clinical impression, 429f 
intensity, 430f 
significant vs insignificant 
systolic murmurs, 446f 
in thoracic aortic dissection, 
diastolic, 665f, 666f, 673f 
typical murmur, 430 1 
in aortic stenosis 
carotid pulse, 446t 
intensity, 446f 
location of murmur, 446f 
radiation to carotids, 438f 
S2 (second heart sound), 446f 
systolic murmur, 446 1 
timing, 43 8 1 

in hypertrophic cardiomyopathy 
change with maneuvers, 439 
in mitral regurgitation 

during myocardial infarction, 
439f 

intensity, 447f 
location, 4391 
timing, 439f 

in tricuspid regurgitation 
change with abdominal pressure, 
439 

change with inspiration, 439 


Murphy sign, 138, 145, 146 
sensitivity, specificity, or likelihood 
ratio 

in acute cholecystitis, 1471 
muscle wasting, subjective global 
assessment (SGA), 374 
MVP. See mitral valve prolapse 
myalgia 

sensitivity, specificity, or likelihood 
ratio 

in influenza, 3481 
in pneumonia, adult 5361 
myasthenia gravis, 449-456,459 
acetylcholine receptor antibody¬ 
positive myasthenia gravis, 
449, 450/ 

anticholinesterase tests, 451-453 
likelihood ratio of, 4601 
office tests, accuracy of, 454-455 
prior probability, 460 
reference standard tests, 460 
search strategy and quality review, 
453 

statistical methods, 454 
symptoms and signs of 
accuracy of, 454 
anatomical and physiological 
origins of, 450-451 
elicitation of, 451 
Mycoplasma pneumoniae, 344 
myeloproliferative syndrome, 612 
Myerson sign, 508 

myocardial infarction (MI), 195,461- 
469. See also cardiac ischemic 
chest pain; chest pain 
accuracy of, 466 
electrocardiogram, 468 
medical history, 467-468 
physical examination, 467-468 
acute cardiac ischemia, multivariate 
findings for, 473-474 
acute chest pain, diagnosis of, 462- 
463,463/ 

cardiac and noncardiac conditions, 
463-464 

clinical findings of, 468-469 
clinical prediction rules, 468-469 
diagnostic criteria, 476 
findings of, 471 

guidelines, evidence from, 474-475 
likelihood ratio of, 476 
literature review, results of, 472-473 
literature search, 471 
mechanism of, 463 
methods 
analysis, 465 

precision and accuracy, test criteria 
for, 464 
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myocardial infarction (MI) ( Continued ) 
quality assessment, 464-465 
search strategy, 464 
selection of articles, 464 
multivariate findings for, 476f 
original publication data, 

improvements in, 472 
precision of 

electrocardiogram, 465-466 
medical history, 465 
physical examination, 465 
pretest probability, 469 
prior probability, 476 
reference standard, changes in, 472, 
476 

symptoms and signs, 463 
univariate findings for, 476f 
myoclonus, 217, 221,222 
sensitivity, specificity, or likelihood 
ratio 

in coma, 226 1 


nafcillin, 519 
nailfold angles, 164,166 
for clubbing, 165/ 
nasal congestion 
differential diagnosis of, 594f 
sensitivity, specificity, or likelihood 
ratio 

in influenza, 356f 
in otitis media, child, 497f 
in pneumonia, adult, 536f 
in sinusitis, purulent, 603t 
nasal flaring, in pneumonia, infant and 
child, 544f 

nasal turbinates, 594, 595/ 

NASCET. See North American 
Symptomatic Carotid 
Endarterectomy Trial 
National Ambulatory Medical Care 
Survey, 493, 615 

National Center for Health Statistics, 53 
National Health Interview Survey data, 
99 

National Heart, Lung, and Blood 
Institute, 161 

National Institute of Neurological 
Disorders and Stroke, 634 
National Institute on Alcohol Abuse and 
Alcoholism, 48 

National Institutes of Health Stroke 

Scale (NIHSS), 630, 630f, 633, 
637, 637 1 

reliability of, 634, 635f 
National Osteoporosis Foundation, 478 


National Program of Cancer Registries, 
267 

National Society of Genetic Counselors, 
267 

nausea. See nausea and vomiting 
nausea and vomiting 
sensitivity, specificity, or likelihood 
ratio 

in acute cholecystitis, 140f 
in appendicitis, adult, 571 
in meningitis, adult, 406f 
in myocardial infarction, 576f 
NBSS. See Canadian National Breast 
Screening Study 

neck 

carotid artery cause, for bruits, 103- 
104 

stiffness, 396, 399 

negative likelihood ratio (LR-), 19, 57, 
197,236, 454, 554,617 
in clinical examination, 9,12 
median, 253 
for test of Speed, 589 
for test of Yergason, 589 
negative predictive value, 4, 5 
calculation of, 8/ 

Neisseria meningitides, 400 
neostigmine bromide, 452 
nephrotic syndrome 
ascites, 65 

nervous tension, perimenopausal, 409 
neurologic compromise, low back pain, 
77-80 

cauda equina syndrome, 80 
imaging tests, indications for, 80 
lumbar disk herniations, 77-78 
motor, reflex, and sensory 

dysfunction, assessment of, 78- 
80 

spinal stenosis, 80 
neurologic deficit 
sensitivity, specificity, or likelihood 
ratio 

in stroke, 64If 

in thoracic aortic dissection, 673f 

nevi 

atypical (dysplastic), 384 
clinical assessment 
for spider nevi, precision of, 3/ 
multiple nevi, 384 
night sweats, perimenopausal, 409 
NIHSS. See National Institutes of Health 
Stroke Scale 
nitric oxide, 616 
nitroglycerin, 473 
Nixon method, 606-607, 607/, 608 
sensitivity, specificity, or likelihood 
ratio 


in splenomegaly, 612f 
nomogram, 7 

noncompliance. See also compliance 
measurement, 174-175 
methods, 175 
nature of, 174 
normovolemic 
phlebotomy study, 316f 
postural vital signs, 318f 
North American Symptomatic Carotid 
Endarterectomy Trial 
(NASCET), 105,108 
Nugent criteria, 694 1 
sensitivity, specificity, or likelihood 
ratio 

for vaginitis, 7071,69 At 
nutrition-associated complications, 

372 

nutritional status assessment, 371-372. 
See also malnutrition 
accuracy of, 375-376 
anatomic and physiologic origin, 372 
components of, 371-372 
precision of, 375 

subjective global assessment, features, 
373f 

dietary intake change, 373 
functional capacity, 374-375 
gastrointestinal symptoms, 374 
weight change, 372, 373 
symptoms and signs, 376-377 
nystagmus 
origin of, 711 


obesity, 20 

observed agreement. See agreement 
obstructive airways disease, 159-162 
findings of, 159-160 
guidelines, evidence from, 161 
likelihood ratio, 162 
literature review, results of, 160-161 
literature search, 159 
original data publication, 

improvements in, 160 
prior probability, 162 
reference standard, changes in, 160, 
162 

obstructive lung disease, 528 
obturator sign 
in appendicitis, 55 
examination for, 55/ 
occult vertebral fracture, 478 
hand grip strength test for, 479,485 
rib-pelvis distance test for, 479,485 
skinfold thickness test for, 479,485 
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tooth count, 485-486 
wall-occiput distance test for, 479,485 
ocular myasthenia, 450 
oculocephalic reflex. See reflexes 
odds ratio (OR), 9, 35, 228,237 
diagnostic, 12 
odor 

sensitivity, specificity, or likelihood 
ratio 

in vaginitis, 7071 
office blood pressure 
factors, affecting, 303-304 
vs usual blood pressure, 305 
OME. See otitis media with effusion 
opening snap 

sensitivity, specificity, or likelihood 
ratio 

murmur, diastolic, 421 
OR. See odds ratio 

ORAL See osteoporosis risk assessment 
instrument 

orbicularis oculi weakness, 451,452/ 
orthopnea, 153. See also dyspnea 
orthostatic hypotension, 3201 
Osier sign, 304-305 
OST. See osteoporosis self-assessment 
screening tool 
osteomyelitis, 594 
osteoporosis, 477-486,489-491 
arm span-height difference, 479, 
482-483 

definition of, 478 

diagnostic accuracy, 480,4831, 4841 
elicitation of, 478-480 
findings of, 489 

hand grip strength test for, 479, 

485 

height loss, 482 
likelihood ratio of, 491 
literature review, results of, 490 
multivariate findings for, 490f 
univariate findings for, 490f 
literature search, 489 
methods 
data analysis, 480 
quality assessment, of articles, 480 
original publication data, 

improvements in, 490 
pathophysiology of, 478 
precision of, 480,4821 
prevalence of, 478,4911 
prior probability, 491 
reference standard, changes in, 490, 
491 

rib-pelvis distance test for, 479,485 
skinfold thickness test for, 479,485 
study characteristics, 480, 48It, 482f 
tooth count, 485-486 


wall-occiput distance test for, 479,485 
weight, 483-485 

osteoporosis risk assessment instrument 
(ORAI), 490,491f 

osteoporosis self-assessment screening 
tool (OST), 489,490,491f 
otitis media 

diagnostic criteria, in children, 494 
otitis media with effusion (OME), 495 
and acute otitis media, distinguishing 
between, 493 

otolaryngologist, 496, 596, 597 
otosclerosis, 711 
ovarian cancer, 265,266,267 
ovarian carcinoma, 72 
Ovid MEDLINE, 47 
Oxfordshire Classification of Subtypes 
of Cerebral Infarction 
stroke, symptoms, 634 
oxygen free radicals, 616 
oxymetazoline hydrochloride, 595 


pachydermoperiostosis, 163 
PACS. See partial anterior circulation 
infarction syndrome 

pain 

sensitivity, specificity, or likelihood 
ratio 

in acute cholecystitis 
guarding, 1401 
rebound, 1401 
rectal, 1401 

right upper quadrant, 1471 
rigidity, 1401 
in appendicitis, adult 
guarding, 571 
migration, 571 
pain before vomiting, 571 
rebound tenderness, 571 
rectal tenderness, 571 
right lower quadrant, 571 
rigidity, 571 
in back pain 
duration, 761 
positional, 761 

in cancer-induced back pain, 
nocturnal, 861 
in coma 

motor response, 2251 
withdrawal response, 2251 
in influenza, 3481-3491 
nasal congestion, 3481 
pharyngitis, 3491 

in knee injury, joint line tenderness, 
3701 


in meningitis, adult, headache, 4061 
in myocardial infarction 
chest wall, 4671 
pleurisy, 4671 
positional, 4671 
radiation to the arms, 5761 
in otitis media, child, ear, 5031 
in temporal arteritis 
headache, 6541 
jaw, 6561 
scalp, 6541 

in thoracic aortic dissection, 6731 
migratory, 6731 
sudden onset, 6731 
“tearing” or “ripping,” 6731 
in urinary tract infection, women, 
6791 

back pain, 6801 
flank pain, 6791 
lower abdominal pain, 6791 
pain provocation test of Mimori 
for labral tears, 582/, 5861 
palpable expansile tumor, 18 
palpation, 291,293,294, 295 
for airflow limitation, 151,153,154 
of clubbing, 165-166 
sensitivity, specificity, or likelihood 
ratio 

of abdomen, for abdominal aortic 
aneurysm, 271 

of liver, for hepatomegaly, 2991 
of spleen, for splenomegaly, 6121, 
6131 

of temporal artery, for temporal 
arteritis, 6491 

of thyroid, for goiter, 2861,2871 
of spleen, 607-608,609, 6091 
paradoxic ptosis. See curtain sign 
parainfluenza, 344 
paranasal sinuses 
coronal view of, 595/ 
sagittal view of, 594/ 
transillumination of, 596-597 
Waters view of, 594 
parental suspicion of otitis media, 

5031 

paresthesia 

sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome, 1151 
Parkinson disease (PD), 505-511, 513- 
514 

accuracy of, 510 
findings of, 513 
guidelines, evidence from, 513 
likelihood ratio of, 514 
literature review, results of, 513 
literature search, 513 
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Parkinson disease (PD) ( Continued ) 
methods, 509 
original publication data, 

improvements in, 513 
and parkinsonism, distinguishing 
between, 506 

pathophysiologic characteristics of, 
506 

precision of, 510 
prior probability, 514 
quality of evidence, 508?, 509-510 
reference standard, changes in, 513 
signs of, 506-507, 510t 
elicitation of, 507-508 
symptoms of, 506-507, 509? 
parkinsonian facies, 509 
paroxysmal nocturnal dyspnea. See 
dyspnea 

parsimonious clinical examination, 13- 
14 

parsimony, 13 

partial anterior circulation infarction 
syndrome (PACS), 634 
Pastia sign, 617 
patella, 359-360 
patella reflex, 79 
pathophysiology 
of appendicitis, 54 

of community-acquired pneumonia, 
528 

patient, 1-2 
alcoholic, 1 
ascites, 1 

patient-generated subjective global 
assessment (PG-SGA), 380 
Patient Health Questionnaire (PHQ-9), 
259, 260, 261,263. See also 
clinical prediction rules and 
scores 

patient’s medical history, information 
in, 11 

PCL. See posterior cruciate ligament 

PD. See Parkinson disease 
PDR. See phalangeal depth ratio 

PE. See pulmonary embolism 
peak expiratory flow, 155 
pediatric pneumonia 

likelihood ratio for, 550 
multivariate findings for, 548 
univariate findings for, 548, 549f 
Pedigree Standardization Task Force, 
267 

peek sign 

sensitivity, specificity, or likelihood 
ratio 

myasthenia gravis, 460? 
pelvic appendicitis, 54 
penicillin, 174 


penicillin allergy, 515-521, 523-525 
(3-lactam antibiotics, cross-reactivity 
with, 518 

clinical history, 518-519 
accuracy of, 519 
findings of, 523 
guidelines, evidence from, 524 
how to take a history for, 518 
hypersensitivity reactions, 

classification of, 516-518, 517? 
immediate reactions, 516-517 
late reactions, 517-518 
likelihood ratio of, 525 
literature review, results of 
univariate findings for, 524 
literature search, 523 
methods, 516 
original publication data, 

improvements in, 524 
prior probability, 525 
reference standard, changes in, 524, 
525 

sensitivity, specificity, or likelihood 
ratio 

history of, 525t 
skin testing, 519-520 
limitations of, 520-521 
penicillin G, 520 

penicillin skin test, 519-520, 523, 524, 
525? 

limitations of, 520-521 
percussion, 290, 291,294, 295 
advantages of, 295 
for airflow limitation, 151, 153, 

154 

measurement of, 294-295 
sensitivity, specificity, or likelihood 
ratio 
in ascites 

flank dullness, 69? 
fluid wave, 69?, 73? 
shifting dullness, 69t, 73? 
in chronic obstructive airways 
disease 

cardiac dullness, 154? 
chest hyperresonance, 154f 
for splenomegaly, of spleen, 612?, 
613? 

in pneumonia, adult 
dullness of the lungs, 5311 
percussion methods, for splenomegaly 
Castell method, 607, 607/, 608 
Nixon method, 606-607, 607/, 608 
Traube space, 607, 607/ 
percussive span technique, 295 
perimenopause, 407, 416f 
definition, 408 

estimate pretest probability of, 408 


evaluation 

family and medical history 
age of mother’s menopause, 409 
cigarette use, 409 
hysterectomy status, 410 
laboratory tests 
estradiol, 410 

follicle-stimulating hormone, 

410 

inhibins, 410 
physical signs 
maturation index, 410 
skin thickness, 410 
vaginal pH, 410 
self-assessment, 408-413 
symptoms, 409 
depressed mood, 409 
hot flashes, 409 

nervous tension and irritability, 
409 

night sweats, 409 
urinary incontinence, 409 
vaginal dryness, 409 
variable sexual interest, 409 
evidence from guidelines, 416 
findings of, 415 
likelihood ratio, 416?, 417f 
literature review, 415-416 
literature search, 415 
methods 

search strategy and quality review, 
410-411 

statistical methods, 412 
original publication data, 

improvements in, 415 
physiology, 408 
prior probability, 417 
reference standard, changes in, 415 
peripheral edema, 72 
peripheral hemodynamic signs, 421-422 
peritoneal fluid, 71, 72 
periumbilical bruits, 30 
petechiae, 617 

PG-SGA. See patient-generated 

subjective global assessment 
phalangeal depth ratio (PDR), 164-165, 
166 

Phalen sign, 112?, 114 
likelihood ratio, 124? 
sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome, 124f 
pharyngeal exudate, in streptococcal 
pharyngitis, 618? 
pharyngeal weakness, 451 
pharyngitis, 615,623. See also pain 
differential diagnosis of, 616? 
streptococcal, 625 
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PHQ. See PRIME-MD Patient Health 
Questionnaire 
PHQ-9. See Patient Health 
Questionnaire 

physiologic origin, of abdominal bruit, 
29 

physiologic tremor, 506, 514 
pigmented skin lesion. See ABCD(E) 
criteria 

pill count, in medication adherence, 
176f 

pill-rolling tremor, 506 
PIOPED. See Prospective Investigation 
of Pulmonary Embolism 
Diagnosis study 

PISA-PED. See Prospective Investigative 
Study of Acute Pulmonary 
Embolism Diagnosis 
plain abdominal radiographs 
for appendicitis, 54 
plain film radiographs 
for sinusitis 
likelihood ratio, 603 
plaster casts, of fingers, 171 
pleuritic chest pain, 215, 225 
pneumatic otoscopy, 493,494,495,496, 
499 

pneumococcal pneumonia, 527 
pneumonia. See also acute respiratory 
illness; lower respiratory tract 
illness 

community-acquired, 527-533 
pneumonia, in infant and child, 539- 
545, 547-550. See also 
tachypnea 

anatomy and pathophysiology of, 
540-541 
bacterial, 540 
findings of, 547 
guidelines, evidence from, 549 
literature search, 547 
methods, 539-540 
original publication data, 

improvements in, 548 
pediatric pneumonia 
likelihood ratio test for, 550 
multivariate findings for, 548 
univariate findings for, 548, 

549f 

prior probability, 550 
reference standard, changes in, 540, 
548, 550 

symptoms and signs 
accuracy of, 543-545 
elicitation of, 541-542 
precision of, 542-543 
pneumonia score, 548 
Pneumonia Severity Index, 536 


point-of-care testing, 623-624, 625f 
polyps, 594, 595 

popliteal-brachial gradient, in aortic 
regurgitation, 425f 

positive likelihood ratio (LR+), 19, 56, 
58, 197, 236, 454,513, 554, 
617 

in clinical examination, 9,12 
median, 253 
for test of Speed, 589 
for test of Yergason, 589 
positive predictive value, 4 
calculation of, 8 f 
posterior circulation infarction 
syndrome (POCS), 634 
posterior cruciate ligament (PCL), 358, 
359, 361,362 
physical examination 
accuracy of, 364f 
posterior drawer test, 361 
posterior probability, 9 
postmyocardial infarction, 210 
postphlebitic syndrome, 235 
posttest probability, 4, 7 
calculation of, 8 f 

postural tachycardia. See tachycardia 
postural vital signs, 316 
PR. See pulmonic regurgitation 
precision. See also k 
calculation of, 8/ 
of clinical examination, 1, 3-4, 9 
“good” symptom or sign, 11-12 
for left-sided heart failure, 

189 

likelihood ratio, 9-11 
meta-analysis, 12-13 
pretest probability, 11 
“sensitivity-only” studies, 13 
pregnancy, 551-557, 559-560 
guidelines, evidence from, 559 
home pregnancy tests, accuracy of, 

555- 556 

likelihood ratio test for, 560 
literature review, results of, 559 
literature search, 559 
methods 

search strategy, 553-554 
original publication data, 

improvements in, 559 
patient history, accuracy of, 554-555 
physical examination, accuracy of, 

556- 557 

prior probability, 560 
reference standard, changes in, 559, 
560 

signs and symptoms, 554-555 
elicitation of 
medical history, 552 


physical examination, 552-553 
reference standard for, 553 
during first trimester 
anatomic and physiologic 
origins, 552 
uterine height, 553/ 
pregnant women 
problems in 
at-risk drinking, 44-45 
preoperative carotid bruit, 105 
pressure provocation test, 112f 
pretest probability, 5, 7, 11 
calculation of, 8 f 
prevalence, calculation of, 8/ 

Primary Care Evaluation of Mental 

Disorders (PRIME-MD), 250, 

251.252, 259,260, 261,263 
PRIME-MD Patient Health 

Questionnaire (PHQ), 250, 

251.252. 

PRIME-MD. See Primary Care 
Evaluation of Mental 
Disorders; clinical prediction 
rules and scores 
problem drinking, 47 
Prospective Investigation of Pulmonary 
Embolism Diagnosis 
(PIOPED) study, 562, 563, 564 
Prospective Investigative Study of Acute 
Pulmonary Embolism 
Diagnosis (PISA-PED) study, 
563, 564, 565, 566, 572 
prostaglandins, 616 
prostate cancer, 265 
protein-energy malnutrition, 372 
provocation test 
for labral tears, 579, 580t 
for shoulder instability, 579, 580 1 
pseudohypertension, 304 
Psoas sign, of appendicitis, 55 
sensitivity, specificity, or likelihood 
ratio 

for appendicitis, 5 It 
psychogenic dizziness, 711 
ptosis, 451, 455 
puddle sign, 66 

sensitivity, specificity, or likelihood 
ratio 

for ascites, 69 1 

pulmonary crackles. See rales 
pulmonary embolism (PE), 227,235, 
561-569,571-575 

clinical examination, precision of, 566 
clinical gestalt, 563-564 
negative result for, 563 
positive result for, 563 
clinical prediction rules, 564-565 
accuracy of, 565f, 568f 
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pulmonary embolism (PE) ( Continued ) 
components of, 566 
validation of, 565-566 
findings of, 571 
guidelines, evidence from, 574 
likelihood ratio test for, 572f, 575 
literature review, results of, 572-574 
literature search, 571 
methods 
data analysis, 563 
data sources, 562 

study selection and data extraction, 
562-563 

original publication data, 

improvements in, 571-572 
pretest probability, accuracy of, 564f 
prior probability, 575 
reference standard, changes in 
computed tomography (CT) 
angiography, 572 
D-dimer assay, 572 
enzyme-linked immunosorbent 
assay, 572 

pulmonary fibrosis, 528 
pulmonic regurgitation (PR), 421 
and mitral stenosis, 423,425 
pulse deficit (arms), in thoracic aortic 
dissection, 673f 
pulse pressure, 421 
in aortic regurgitation, 425f 
pulsus paradoxus 

sensitivity, specificity, or likelihood 
ratio 

in chronic obstructive airways 
disease, 154f 

pupillary response. See reflexes 
pyridostigmine bromide, 452 


quadriceps weakness, 79 
sensitivity, specificity, or likelihood 
ratio 

in sciatica, 79 1 

QUAD AS. See Quality Assessment of 
Diagnostic Accuracy Studies 
checklist 

quality, 15-16 

Quality Assessment of Diagnostic 

Accuracy Studies (QUADAS) 
checklist, 583 

Quebec Task Force on Spinal Disorders, 
80 

questionnaire 
for depression 

Beck Depression Inventory (BDI), 
250 


Center for Epidemiologic Studies 
Depression (CES-D), 250 
Duke Anxiety and Depression Scale 
(DADS), 250 

Geriatric Depression Scale (GDS), 
250 

Patient Health Questionnaire 
(PHQ-9), 263f 

Primary Care Evaluation of Mental 
Disorders (PRIME-MD), 250 
PRIME-MD, 263f 
PRIME-MD Patient Health 
Questionnaire (PHQ), 250 
Zung Self-Rating Depression Scale 
(SDS), 250 

for malnutrition, adult, 380t 
for medication adherence 
Morisky questions, 182f 
for problem alcohol drinking 
Alcohol Use Disorders 

Identification Test (AUDIT), 
51 1 

Alcohol Use Disorders 
Identification Test, 
Consumption Questions 
(AUDIT-C), 51t 
CAGE questions, 52 1 
T-ACE questions, 52f 
TWEAK questions, 52 1 
quiver eye movements, in myasthenia 
gravis, 454f 


radiographic cardiomegaly, 187,189, 
190 

radiographic findings 
sensitivity, specificity, or likelihood 
ratio 

in the breathless emergency patient, 
chest radiograph, 213f 
in left ventricular dysfunction, 
chest radiograph, 213 1 
for sinusitis, sinus films, 603 1 
radiographic redistribution, 186,187, 
190 

radiographic techniques 
for appendicitis, 54 
radioisotopic scintiscan 
for splenomegaly, 606 
radionuclide scanning 
for acute cholecystitis, 138 
rales 

sensitivity, specificity, or likelihood 
ratio 

in the breathless emergency patient, 
213f 


in myocardial infarctions, 467f 
in pneumonia, adult, 536f 
in pneumonia, infant and child, 
550f 

Ramsay Hunt syndrome, 712 
random-effects measure, 13,19 
random-effects model, 316 
randomized controlled trials, 88 
range of motion, 358 
rapid influenza test, 355 
likelihood ratio for, 356f 
reactive airway disease, 452 
readers’ guides 
for diagnostic test, 3t 
rebound tenderness, 55. See also pain; 
guarding 

receiver operating characteristic (ROC) 
curve, 617 

recent weight gain. See weight gain 
rectal examination, for appendicitis, 55 
recurrent vestibulopathy, 711 
reduced muscle power, 451,455 
reflexes 

sensitivity, specificity, or likelihood 
ratio 

Achilles tendon, in normal patients, 
84f 

corneal, in coma, 220 1 
cough, in coma, 220f 
eye movements, in coma, 220 1 
gag, in coma, 220f 
glabella tap, in Parkinsonism, 514f 
oculocephalic, in coma, 220f 
pupillary, in coma, 220 1 
relocation test 
for labral tears, 586f 
for shoulder instability or labral tear, 
581/, 583, 585f, 589, 590, 59If 
renal artery stenosis, 35 
abdominal bruits in, 30 
likelihood ratio, 37 
multivariate findings for, 36 
prior probability, 37 
reference standard tests, 37 
univariate findings for, 36t 
renovascular hypertension 
abdominal auscultation in, accuracy 
of, 31 

evaluation of, abdominal bruits in, 
29-32 

prognosis of, 32 
reserpine, 249 
respiratory distress, 547 
respiratory illness. See community- 
acquired pneumonia; 
pneumonia 

respiratory rate, in children, 541 
respiratory syncytial virus (RSV), 549 
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rest test, 451,455 
rest tremor, 506, 507 
retractions, chest 

sensitivity, specificity, or likelihood 
ratio 

in pneumonia, infant and child, 
550 1 

reverse workup bias, 520 
rheumatoid arthritis, 164 
rhinitis, 593. See also nasal congestion 
allergic, 594 
viral, 594 
rhinorrhea 

differential diagnosis of, 5941 
rhinosinusitis, 603. See also sinusitis 
rhinoviruses, 344 
rhonchi, 155 

sensitivity, specificity, or likelihood 
ratio 

in chronic obstructive airways 
disease, 154f 
rib-pelvis distance 

for occult lumbar vertebral fractures, 
test, 479,485 
in osteoporosis, 483f 
rigidity, 55, 506, 507 
sensitivity, specificity, or likelihood 
ratio 

cog wheeling 
in Parkinson disease, 506, 

5141 

ROC curve. See receiver operating 
characteristic curve 
Rovsing sign, of appendicitis, 55 
RSV. See respiratory syncytial virus 
ruptured abdominal aortic aneurysm 
abdominal palpation for, 19 


52 (second heart sound). See heart 

sounds 

53 (third heart sound). See heart sounds 

54 (fourth heart sound). See heart 

sounds 

SBP. See systolic blood pressure 
Scandinavian Neurological Stroke Scale, 
634 

scarlet fever, 617 
scattergram, 250 
Schamroth sign, 165 
for clubbing, 165/ 

Schober test, 77 
sciatica, 77-78 

physical examination accuracy 
for lumbar disk herniation among 
patients, 791 


SCID. See Structured Clinical Interview 
for DSM-III-R; Structured 
Clinical Interview for DSM- 
IV- TR 
scintigraphy 

for liver examination, 292, 295 
SCORE. See simple calculated 

osteoporosis risk estimate 
Scottish International Guidelines 
Network, 134 
scratch test 

sensitivity, specificity, or likelihood 
ratio 

for hepatomegaly, 300f 
SDDS-PC. See Symptom Driven 

Diagnostic System for Primary 
Care 

SDS. See Zung Self-Rating Depression 
Scale 

seizure, 217,221-222 
sensitivity, specificity, or likelihood 
ratio 

in coma, 226f 

self-administered medication therapy, 
173 

self-diagnosis 

sensitivity, specificity, or likelihood 
ratio 

otitis media, parental suspicion, 503f 
pregnancy, suspicion of, 560 
urinary tract infection, women, 

680 1 

vaginal candidiasis, 696f 
sensitivity, calculation of, 8/ 
“sensitivity-only” studies, 13 
sensory change 

sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome, 115t 
in sciatica, 79t 
sex 

sensitivity, specificity, or likelihood 
ratio 

in ventricular dysfunction, 211 1 
sexually transmitted diseases, 676 
SGA. See subjective global assessment 
shadowgrams, 171 
shadowgraph method, 166 
shifting dullness, 66,67. See also 
percussion 

Short Michigan Alcoholism Screening 
Test (SMAST), 41 

shoulder instability, 577-587, 589-591 
anatomy of, 578-579, 578/ 
clinical tests for, 579-581, 580f, 581/ 
findings of, 589 
guidelines, evidence from, 590 
labral tears, 579 


clinical tests for, 579-581, 580f, 582/ 
limitation of, 583 
physical examination, diagnostic 
accuracy of, 583, 586f 
likelihood ratio for, 591 
limitation of, 583, 583 
literature search, 589 
original publication data, 

improvements in, 590 
physical examination 
diagnostic accuracy of, 583, 585t 
tests for, 580 1 
precision of 
laxity maneuvers, 590t 
provocation maneuvers, 590f 
prior probability, 591 
reference standard, changes in, 590, 
591 

signed rank test, 455 
silicone models, 92, 94, 95 
simple calculated osteoporosis risk 
estimate (SCORE) 
questionnaire, 490,491 
Simplified Medication Adherence 
Questionnaire, 180 
Simplified Wells scoring system, 571, 
572, 574, 575 

for pulmonary embolus, 575f 
SimpliRed assay, 230 
Single Question (SQ), for depression 
250, 251 

single-fiber electromyography 
for myasthenia gravis, 450 
single-leg sit-to-stand test, 84 
sinusitis, 593-598, 601-603. See also 
rhinosinusitis 

anatomy and pathophysiology of, 594 
differential diagnosis of, 594 1 
findings of, 601 
guidelines, evidence from, 602 
likelihood ratio test for, 603 
literature search, 601 
original publication data, 

improvements in, 602 
paranasal sinuses 
coronal view of, 595/ 
sagittal view of, 594/ 
transillumination of, 596-597 
Waters view of, 594 
prior probability, 603 
reference standard for, 594 
changes in, 602, 603 
symptoms and signs 
accuracy of, 596-598 
elicitation of, 594-595 
precision of, 595-596 
univariate findings for 
literature review results, 602 
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sit-to-stand test, 84 
sensitivity, specificity, or likelihood 
ratio 
back pain 

upper lumbar herniation, 86f 
disk herniation, 86 f 
skin examination, for malignant 
melanoma 
accuracy of 

ABCD(E) checklist, 385f, 385-387 
for detecting presence or absence, 
387 t, 387-388 

revised 7-point checklist, 386, 386f 
checklists as diagnostic aid, 384 
criterion standard for diagnosis, 385 
historical feature assessment, 384 
physical examination technique, 384 
precision of, 385 
signs and symptoms, 383-384 
skin turgor, 319, 331 
sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, child, 334f 
skinfold thickness test 
for occult vertebral fracture, 479, 

485 

SLAP lesion. See superior labrum 
anterior posterior lesion 
sleep test, 451,455 
sensitivity, specificity, or likelihood 
ratio 

for myasthenia gravis, 460f 
SLR. See straight-leg raising sign 
SMAST. See Short Michigan Alcoholism 
Screening Test 
smoking. See tobacco use 
sneezing 

sensitivity, specificity, or likelihood 
ratio 

in influenza, 356f 
spasticity, 506 

sensitivity, specificity, or likelihood 
ratio 

in Parkinsonism, 506 
specificity, calculation of, 8/ 
spectrum bias, 141,497 
speech 

sensitivity, specificity, or likelihood 
ratio 

in myasthenia gravis, unintelligible, 
460 t 

in Parkinsonism, soft voice, 510f 
in stroke, abnormal, 64It 
sphenoid sinuses, 594/, 595 
sphygmomanometers, 306 
spider nevi 
clinical examination 
precision of, 3/ 


Spiegel criteria, 694f 
sensitivity, specificity, or likelihood 
ratio 

for vaginitis, 707f, 694f 
spinal compression fractures, 77 
spinal infections, 77 
spinal stenosis, 80 

spine range-of-motion measures, 77 
spiral computed tomography (CT) 
scanning 

for pulmonary embolism, 564 
spirometry, 149,150,154,160 
spleen. See also splenomegaly 
palpation of, 607-608,609, 609f 
size of, 605-606,606/ 
splenomegaly, 605-610, 611-613. See 
also spleen 

anatomic landmarks, 605 
clinical examination for 
consequences of, 606 
guidelines, 609 1 
inspection, 606 
palpation, 607-608,609, 609f 
percussion, 606-607,607/, 608 
findings of, 611 
guidelines, evidence from, 612 
likelihood ratio test for, 613 
literature review, results for, 612 
literature search, 611 
original publication data, 

improvements in, 612 
prior probability, 613 
reference standard, changes in, 612, 
613 

signs of 

accuracy, 608-609 
precision, 608 
splenic size, 605-606,606/ 
sputum production, 151,152 
sensitivity, specificity, or likelihood 
ratio 

in obstructive airways disease, 152f 
SQ. See Single Question for depression 
square wrist sign, 112f 
stadiometer, 478-479 
Staphylococcus aureus endocarditis, 

519 

sterile stethoscope, 32 
straight-leg raising (SLR) sign, 78 
sensitivity, specificity, or likelihood 
ratio 

for disk herniation, 86t 
strawberry tongue, 617 
strength testing 

sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome 
thumb, 115t 


in Parkinsonism 
difficultly rising from a chair, 514 1 
in sciatica 
ankle, 79t 
great toe, 79 1 
quadriceps, 79f 
strep throat, 615-621, 623-625 
clinical prediction rules for, 618-620, 
6191 

Centor clinical prediction rule, 619, 
619/ 

Mclsaac clinical prediction rule, 
620, 620/ 

Walsh algorithm, 621/ 
findings of, 623 
guidelines, evidence from, 624 
likelihood of, 625 
literature review, results for, 624 
literature search, 623 
methods 

search strategy and quality review, 
616 

statistical methods, 617 
original publication data, 

improvements in, 624 
pathophysiology of, 616 
pretest probability estimation of, 

618 

prior probability, 625 
reference standard, changes in, 624, 
625 

symptoms and signs, 617 
diagnostic accuracy of, 617-618 
precision of, 617 

Streptococcus pneumoniae, 344,400,494, 
501 

stroke, 627-638 
classification of, 635 
diagnosis of 
accuracy, 633 
flow, 628/ 
reliability, 633 

ischemic stroke subtype analysis, 
635-636 

likelihood ratio test for, 641 
methods, 629-630 
prehospital assessment, 630-631 
prior probability, 641 
prognosis of, 636-637, 636 1 
reference standard tests, 641 
severity, assessment of, 634-635 
symptoms 

Oxfordshire Classification of 
Subtypes of Cerebral 
Infarction, 634 

transient ischemic attack, 627, 
631-633 

vascular distribution of, 633-634 
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Structured Clinical Interview for 
DSM-III-R (SCID), 40, 

250, 254 

Structured Clinical Interview for 
DSM-IV-TR (SCID), 259 
students’ aneurysm, 18 
subcutaneous tissue loss, subjective 
global assessment (SGA), 

374 

subjective global assessment (SGA), of 
nutritional status 
in adult malnutrition, 373 1, 376 1 
dietary intake change, 373 
functional capacity, 374-375 
loss of fluid from intravascular to 
extravascular space, 374-375 
loss of subcutaneous fat, 374 
muscle wasting, 374 
gastrointestinal symptoms, 374 
and postoperative complications, 
relationship between, 376 1 
weight change, 372, 373 
sublingual nitroglycerin, 183 
sulcus sign, 585f 
sunken eyes 

sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, child 334f 
superior labrum anterior posterior 
(SLAP) lesion, 578, 586 1 
swallowing, in myasthenia gravis, 454t 
Swan-Ganz catheterization, 203 
sweating 

sensitivity, specificity, or likelihood 
ratio 

in pneumonia, adult, night sweats, 
536 1 

Swedish Two-County Trial, 89 
Symptom Driven Diagnostic System for 
Primary Care (SDDS-PC), 250, 
251,252 

symptomatic carotid bruit, 105 
systemic disease, low back pain 
ankylosing spondylitis, 77 
cancer, 76 

compression fractures, 77 
spinal infections, 77 
spine range-of-motion measures, 77 
systemic glucocorticoids, 149 
systolic blood pressure (SBP), 301, 302, 
303 

systolic bruits, 35, 36 
systolic click, with mitral valve prolapse, 
440 1 

systolic dysfunction, 184,186,187 
diagnosis of, 210-211 
and diastolic dysfunction, 
difference between, 211 


echocardiograms, 210-211 
postmyocardial infarction, 210 
and diastolic dysfunction, difference 
between, 189 

systolic murmurs, abnormal, 420, 436- 
437 

anatomic and physiologic origins of, 
433-434 

aortic stenosis, 437 
causes, 434 1 
clinical examination 
accuracy of, 436,437f 
precision of, 435-436,436 1 
evidence from guidelines, 445 
examination, 434-435,440 
features, 435/ 
findings of, 443 

hypertrophic cardiomyopathy, 439 
literature review 
accuracy, 444-445 
precision, 444 
literature search, 443 
mitral regurgitation, 438-439 
mitral valve prolapse, 439-440 
original publication data, 

improvements in, 443 
prior probability, 446 
reference standard, changes in, 444, 
447 

tricuspid regurgitation, 439 
systolic-diastolic abdominal bruits, 30, 
31,32 


TA. See temporal arteritis 
T-ACE questionnaire, 44,45, 49, 52t 
sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 50f 
tachycardia 

sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, adult, postural, 327 1 
in pneumonia, adult, 536f 
for ventricular dysfunction, 211 1 
supine, 320 

tachypnea, 543, 547, 548, 549, 561, 571, 
574. See also pneumonia, in 
infant and child 

World Health Organization criteria 
for, 548, 548f 

definition, age based for children, 548f 
sensitivity, specificity, or likelihood 
ratio 

in pneumonia, infant and child, 
550f 


in pneumonia, adult, 536f 
TACS. See total anterior circulation 
infarction syndrome 
tears absent 

sensitivity, specificity, or likelihood 
ratio 

in hypovolemia, child, 340f 
telephone diagnosis, urinary tract 
infection, women, 687 
temporal arteritis (TA), 643-644 
accuracy, 646 

of laboratory evaluation, 649-650 
of physical examination, 646-649 
of symptoms, 646 
elicit signs and symptoms, 644-645 
evidence from guidelines, 655 
findings of, 653 

likelihood ratio, 648f, 649f, 654f, 656 1 
literature review, 654 
literature search, 653 
methods 

search strategy and quality review, 
645 

statistical methods, 645-646 
multivariate findings for, 654-655 
original publication data, 

improvements in, 654 
pathophysiology of, 644 
precision, 646 

of medical history and physical 
examination, 464 
prior probability, 656 
reference standard, changes in, 654, 
656 

sensitivity, 648f, 649f 
temporal artery, in temporal arteritis, 
649f 

tenderness of bicipital groove, 586 
tenting, 331 
test of Speed, 586t 
negative likelihood ratio, 589 
positive likelihood ratio, 589 
test of Yergason, 586f 
negative likelihood ratio, 589 
positive likelihood ratio, 589 
test of Zaslav. See internal rotation 
resistance strength test 
thenar atrophy, 112f 
third heart sound, 190 
Third National Health and Nutrition 
Examination Survey, 478 
thoracic aortic dissection, acute, 659- 
660 

chest radiograph 
accuracy of, 666 
sensitivity of, 667 1 
clinical examination 
accuracy of, 662t, 673 1, 665t 
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thoracic aortic dissection, acute 
( Continued ) 
clinical history 
accuracy of, 663,665 
sensitivity of, 664f 
combinations of findings 
accuracy of, 666-667 
diagnosis, 668 1 
evidence from guidelines, 672 
findings of, 671 

likelihood ratio, 667f, 673, 6731 
literature review, 672 
literature search, 671 
methods 
data analysis, 663 
literature search and selection, 661 
study characteristics, 663 
original publication data, 

improvements in, 671 
pathophysiology of, 660 
physical examination 
accuracy of, 665-666 
sensitivity of, 666f 
prior probability, 673 
reference standards, 673 
sensitivity of, 672f 
sensitivity, specificity, or likelihood 
ratio 

in congestive heart failure, 6661 
signs and symptoms of, 660-661 
thumb abduction, testing, 113 
thyroid size 

sensitivity, specificity, or likelihood 
ratio 

for goiter, 282f 

thyroid-stimulating hormone (TSH), 286 
thyrotoxicosis, 277 
TIA. See transient ischemic attack 
tibia, 358, 359, 361 
Tinaquant D-dimer, 573 
Tinel sign, 112f, 114,116 
sensitivity, specificity, or likelihood 
ratio 

in carpal tunnel syndrome, 124t 
tissue necrosis factor, 616 
tobacco use 

sensitivity, specificity, or likelihood 
ratio 

in myocardial infarction, 476f 
in obstructive airways disease, 161 1, 
162 1 

in perimenopause, 416f 
tolerance, worry, eye opener, amnesia, 
/cut down (TWEAK 
questionnaire), 49, 52t 
sensitivity, specificity, or likelihood 
ratio 

for alcohol abuse, 50f 


tongue weakness, 451 
tonsillar enlargement, in streptococcal 
pharyngitis, 618f 
tonsils, 617 
toothache 

sensitivity, specificity, or likelihood 
ratio 

in sinusitis, 603f 

tooth count, in osteoporosis, 48If 
total anterior circulation infarction 
syndrome (TACS), 634 
tourniquet test, 112t 
TR. See tricuspid regurgitation 
transient ischemic attack (TIA), 627, 
631-633 
diagnosis of 
accuracy, 632 
reliability, 632-633 
transillumination 
sensitivity, specificity, or likelihood 
ratio 

in maxillary sinusitis, 603 1 
Traube space percussion, 607, 607/ 
sensitivity, specificity, or likelihood 
ratio 

for splenomegaly, 613f 
tremor, 507, 514 
action, 506, 507 
classic essential, 506, 514 
of Parkinson disease, 506 
physiologic, 506, 514 
rest, 506, 507 

sensitivity, specificity, or likelihood 
ratio 

in Parkinsonism, 514t 
tremor syndromes, 506 
tricuspid regurgitation (TR), 439 
tricuspid valvular dysfunction, 293 
TSH. See thyroid-stimulating hormone 
TWEAK questionnaire. See tolerance, 
worry, eye opener, amnesia, 
/cut down 

2-point discrimination, 112t 
2-tailed % 2 test, 529 
tympanic membrane, in otitis media, 
child, 503 1 

tympanocentesis, 495,496,498,501,503 
tympanometry, 495 
type I collagen, 478 


ultrashort questionnaire. See Primary 
Care Evaluation of Mental 
Disorders 

ultrasonography, 17-18 
for abdominal aortic aneurysm, 26 


for acute cholecystitis, 138,142 
for appendicitis, 54 
for deep vein thrombosis, 235, 
240-241 

for liver examination, 292,295 
for paranasal sinuses, 594 
unguisometer, 164,171 
Unified Neurological Stroke Scale, 

634 

unstable angina, 462 
upper lumbar disk herniation 
sit-to-stand test, accuracy, 84f 
urinary incontinence, perimenopausal, 
409 

urinary tract infection (UTI), 675 
accuracy 

of dipstick urinalysis, 678 
of physical examination, 678 
of self-diagnosis, 678 
of signs and symptoms, 678, 679- 
680f 

of symptoms combinations, 678 
algorithm for evaluate patients with 
symptoms, 681/ 682-683 
definition, 675-676 
differential diagnoses, 676 
evidence from guidelines, 688 
findings of, 687 
likelihood ratio 
for symptoms combinations, 

681 f 

literature review, 688 
literature search, 687 
methods 

data analysis, 676-677 
quality assessment of included 
articles, 676 

multivariate approach, 689f 
original publication data, 

improvements in, 688 
precision of, 678 
pretest probability of, 680-682 
prior probability, 689 
reference standard, changes in, 688, 
689 

refining probability 
dipstick urinalysis, 682 
with medical history and physical 
examination, 682 
rule out complicate, 680 
sensitivity analysis, 678, 680 
study characteristics, 677 
univariate findings, 689f 
urine specific gravity. See laboratory 
findings 

US Department of Agriculture, 48 
US Department of Health and Human 
Services, 48 
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USPSTF. See US Preventive Services 
Task Force 

US Preventive Services Task Force 

(USPSTF), 18,47,99,109,248, 
259, 261,286, 392, 393,416, 
477, 688, 706 

UTI. See urinary tract infection 


vaginal complaints, in vaginitis, 
691-692 

elicit symptoms and signs, 692-693 

evidence from guidelines, 706 

findings of, 705 

likelihood ratio, 707 

literature search, 705 

methods 

criterion standards, evaluation of, 
693 

data extraction, 693 
evaluation of, 693 
inclusion and exclusion criteria, 
693, 694f 

search strategy, 693 
statistical analysis, 693 
microscopic examination, 692/ 
office laboratory tests, accuracy of, 
699f, 7001 

inflammation, microscopic 
evidence of, 700 
microscopy, 700 
pH level, 700-701 
whiff test, 701 
original publication data, 

improvements in, 706 
precision of, 693 
prior probability, 707 
reference standard tests, 707 
signs, accuracy of 
discharge characteristics, 697, 699 
inflammation, 699-700 
odor, 700 

symptoms, accuracy of, 693-697 
bleeding, 697 

discharge characteristics, 695 
dyspareunia, 697 
irritative symptoms, 695 
itching, 695 
odor, 695, 697 
self-diagnosis, 697 
univariate findings for, 706f 
vaginal dryness 

perimenopausal, 409 
sensitivity, specificity, or likelihood 
ratio 

in menopause, 4121 


vaginal infections, 676, 682 
vaginal symptoms 

sensitivity, specificity, or likelihood 
ratio 

in vaginitis, 7071 
discharge characteristics, 
6951-6981 

in urinary tract infection 
discharge, 6891 
irritation, 6891 

Valsalva maneuver, 128,196, 420 
sensitivity, specificity, or likelihood 
ratio 

in heart failure, 2001 
valvular heart disease 

physical examination, 444-445 
variable sexual interest, 

perimenopausal, 409 
vascular bruits, 29 
vascular distribution, of stroke, 
633-634 
accuracy of, 633 
reliability of, 633-634 
vasodilator therapy, 184 
venography, 229 
venous hums, 104 

compared to arterial bruit, 

2921 

venous thromboembolism, 227, 

561 

risk factors for, 5621 
venous waveforms 
abnormal, 1261 
analysis of, 126 
in central venous pressure 
assessment, 1261 

ventricular fibrillation cardiac arrest, 
215 

verification bias, 16,138,141,498, 582, 
589 

vertigo, 709-710 
causes, 7101 

elicit symptoms and signs, 711- 
712 

finding of, 715 
likelihood ratio, 717 
literature review, 716 
literature search, 715 
origin of, 710 
original publication data, 

improvements in, 716 
prior probability, 717 
reference standard, changes in, 716, 
717 

symptoms and signs 
accuracy of, 712-713 
vestibular neuronitis, 711, 712 
Veterans Affairs, 179 


visual analog scale, 564 
volume depletion, 127, 315, 316, 325, 
326, 327,330, 331 
vomiting. See nausea and vomiting 


wall-occiput distance test 

for occult thoracic vertebral fractures, 
479, 485 

in osteoporosis, 483f 
Walsh algorithm 
for sore throat, 621/ 
water hammer pulse (Corrigan), 

425 1 

weak thumb abduction, 112t 
web resources, for alcohol screening, 

49 

weight 

in osteoporosis, 484,485,490 1 
sensitivity, specificity, or likelihood 
ratio 

gain in ascites, 68 1, Tit 
loss in low back pain, 76f 
weighted K statistic, 571 
Welch-Allyn-Finnoff transilluminator, 
595 

Wells scoring system, 565, 566f. See also 
clinical prediction rules and 
scores 

sensitivity, specificity, or likelihood 
ratio 

for deep vein thrombosis, 
simplified, 246f 

for pulmonary embolus, simplified, 
572 1, blit 

wheezing, 151,154-155 

sensitivity, specificity, or likelihood 
ratio 

in obstructive airways disease, 153 f 
154 f, 16 If, 162f 
in pneumonia, adult, 536f 
in pneumonia, infant and child, 

550 1 

white coat hypertension. See office 
blood pressure 
WHO. See World Health 
Organization 

WHO-5. See World Health 

Organization-5 Well-Being 
Scale 

whole blood agglutination test, 

230 

whole blood assays, 239 
WinBUGS software, 218 
WISE. See Women’s Ischemia Syndrome 
Evaluation study 
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Women’s Ischemia Syndrome 

Evaluation (WISE) study, 
415 

workup bias. See verification bias 
World Health Organization (WHO), 
41,48, 161,247, 330, 393, 
408, 462, 478, 539, 547, 

548 

Flunet, 344 


International Influenza Program, 
344 

World Health Organization-5 Well- 
Being Scale (WHO-5), 259 


x-ray. See radiographic findings 


Yale 1-question screen, 259-260 


Zung Self-Rating Depression Scale 
(SDS), 250, 251 
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